From 6ec1b314aae6910739203d7385e1f41b525aa9ae Mon Sep 17 00:00:00 2001 From: Andre Date: Tue, 3 Feb 2026 03:35:49 -0500 Subject: [PATCH 1/7] chore(repo): rename license file and move lint requirements Keep license/README references consistent and relocate Python lint requirements under scripts/. --- LICENSE => LICENSE.txt | 0 README.md | 4 ++-- requirements-lint.txt => scripts/requirements-lint.txt | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) rename LICENSE => LICENSE.txt (100%) rename requirements-lint.txt => scripts/requirements-lint.txt (83%) diff --git a/LICENSE b/LICENSE.txt similarity index 100% rename from LICENSE rename to LICENSE.txt diff --git a/README.md b/README.md index c63aa115..5049ae24 100644 --- a/README.md +++ b/README.md @@ -291,7 +291,7 @@ This license provides: - 🛡️ **Defensive termination** - deters frivolous patent litigation. - 🌍 **Industry standard** - widely recognized and trusted. -For the complete license text, see [LICENSE](LICENSE). +For the complete license text, see [LICENSE](LICENSE.txt). ## Security @@ -308,4 +308,4 @@ For detailed security information, see the [Security Architecture](docs/tech_spe [workflow-go-ci]: https://github.com/novus-engine/novuspack/actions/workflows/go-ci.yml [workflow-go-bdd]: https://github.com/novus-engine/novuspack/actions/workflows/go-bdd.yml [workflow-python-lint]: https://github.com/novus-engine/novuspack/actions/workflows/python-lint.yml -[license-file]: LICENSE +[license-file]: LICENSE.txt diff --git a/requirements-lint.txt b/scripts/requirements-lint.txt similarity index 83% rename from requirements-lint.txt rename to scripts/requirements-lint.txt index cf5fa250..6ab2089c 100644 --- a/requirements-lint.txt +++ b/scripts/requirements-lint.txt @@ -1,5 +1,5 @@ # Python lint tooling for make lint-python and CI (python-lint.yml). -# Install into a venv: python3 -m venv .venv && .venv/bin/pip install -r requirements-lint.txt +# Install into a venv: python3 -m venv .venv && .venv/bin/pip install -r scripts/requirements-lint.txt # Or use: make venv flake8 pylint From ebb5210e45409ad60c3c05935da27f6231f36cbe Mon Sep 17 00:00:00 2001 From: Andre Date: Tue, 3 Feb 2026 03:36:28 -0500 Subject: [PATCH 2/7] build(ci): route docs-check artifacts to tmp and update tooling - Write docs-check validation outputs under tmp/ and exclude tmp/ from markdownlint. - Add Go setup in go-ci. - Update generate-anchor workflow/Make targets to use file/line inputs to avoid backtick quoting issues. - Improve lint-python target to report all failures. --- .github/copilot-instructions.md | 16 +-- .github/workflows/docs-check.yml | 37 +++--- .github/workflows/go-ci.yml | 6 + Makefile | 61 +++++----- ai_files/ai_coding_instructions.md | 21 ++-- api/go/Makefile | 13 ++- scripts/generate_anchor.py | 182 ++++++++++++++--------------- 7 files changed, 175 insertions(+), 161 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index d244e2cc..0e92fc5e 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -20,10 +20,10 @@ alwaysApply: true - Do not bypass linting in CI, Make targets, or tooling configuration. - Do not add ignore directives (e.g., `# noqa`, `# pylint: disable=...`, `//nolint`) to silence failures unless explicitly instructed. - Fix the underlying issue instead of suppressing it. -- **Shell quoting with backticks:** When passing text containing backticks (e.g., markdown headings with inline code like `` `code` ``) to shell commands or Makefile targets, always use single quotes to preserve the backticks. - - Correct: ```make generate-anchor TEXT='Heading with `code` example'``` - - Incorrect: ```make generate-anchor TEXT="Heading with \`code\` example"``` (shell processes backticks before make sees them) - - This applies to any command or script that accepts text containing backticks. +- **Shell quoting with backticks:** Avoid passing markdown headings directly as command-line string arguments. + Prefer file-based inputs to eliminate quoting issues (especially backticks). + If you must pass text with backticks on the commandline, use single quotes (not double-quotes) wherever possible + and properly escape the backticks in all other cases. - Avoid using commands which require approval. - Commands that do not require approval: - awk @@ -180,9 +180,8 @@ Quick reference: - Note: Skipped when PATHS is specified (requires all tech specs to validate the index) - **`make validate-req-references [VERBOSE=1]`** - Validate REQ references in feature files - Note: Skipped when PATHS is specified (requires all feature files to validate references) -- **`make generate-anchor TEXT='Heading Text'`** - Generate markdown anchor from heading text - - Use single quotes when TEXT contains backticks: ```make generate-anchor TEXT='Heading with `code` example'``` - - The script automatically removes backticks and their contents when generating anchors +- **`make generate-anchor FILE='path/to/file.md'`** - Print anchors for all headings in a file +- **`make generate-anchor LINE='path/to/file.md:224'`** - Print anchor for the heading at a specific line in a file #### Coverage Audits @@ -217,4 +216,5 @@ All make commands are pre-approved for use: - `make audit-feature-coverage` - Feature coverage audit - `make audit-requirements-coverage` - Requirements coverage audit - `make audit-coverage` - All coverage audits -- `make generate-anchor TEXT='...'` - Generate markdown anchor (use single quotes) +- `make generate-anchor FILE='path/to/file.md'` - Print anchors for all headings in a file +- `make generate-anchor LINE='path/to/file.md:224'` - Print anchor for a heading at a specific line diff --git a/.github/workflows/docs-check.yml b/.github/workflows/docs-check.yml index 6f1266a3..f1159792 100644 --- a/.github/workflows/docs-check.yml +++ b/.github/workflows/docs-check.yml @@ -58,94 +58,97 @@ jobs: - name: Install markdownlint-cli2 run: npm install -g markdownlint-cli2 + - name: Create tmp directory + run: mkdir -p tmp + - name: Run Go code blocks validation - run: python3 scripts/validate_go_code_blocks.py --output go_code_blocks_report.md --verbose + run: python3 scripts/validate_go_code_blocks.py --output tmp/go_code_blocks_report.md --verbose - name: Upload Go code blocks report if: failure() uses: actions/upload-artifact@v4 with: name: go-code-blocks-validation-report - path: go_code_blocks_report.md + path: tmp/go_code_blocks_report.md if-no-files-found: ignore - name: Run Go signature consistency validation - run: python3 scripts/validate_go_spec_signature_consistency.py --output signature_consistency_report.txt --verbose + run: python3 scripts/validate_go_spec_signature_consistency.py --output tmp/signature_consistency_report.txt --verbose - name: Upload signature consistency report if: failure() uses: actions/upload-artifact@v4 with: name: signature-consistency-report - path: signature_consistency_report.txt + path: tmp/signature_consistency_report.txt if-no-files-found: ignore - name: Run heading numbering validation - run: python3 scripts/validate_heading_numbering.py --output heading_numbering_report.txt --verbose + run: python3 scripts/validate_heading_numbering.py --output tmp/heading_numbering_report.txt --verbose - name: Upload heading numbering report if: failure() uses: actions/upload-artifact@v4 with: name: heading-numbering-validation-report - path: heading_numbering_report.txt + path: tmp/heading_numbering_report.txt if-no-files-found: ignore - name: Run Go definitions index validation - run: python3 scripts/validate_api_go_defs_index.py --output go_defs_index_report.txt --verbose + run: python3 scripts/validate_api_go_defs_index.py --output tmp/go_defs_index_report.txt --verbose - name: Upload Go definitions index report if: failure() uses: actions/upload-artifact@v4 with: name: go-definitions-index-report - path: go_defs_index_report.txt + path: tmp/go_defs_index_report.txt if-no-files-found: ignore - name: Run requirement reference validation - run: python3 scripts/validate_req_references.py --output req_references_report.txt --verbose + run: python3 scripts/validate_req_references.py --output tmp/req_references_report.txt --verbose - name: Upload requirement reference report if: failure() uses: actions/upload-artifact@v4 with: name: requirement-reference-report - path: req_references_report.txt + path: tmp/req_references_report.txt if-no-files-found: ignore - name: Run feature coverage audit - run: python3 scripts/audit_feature_coverage.py --output feature_coverage_report.txt --verbose + run: python3 scripts/audit_feature_coverage.py --output tmp/feature_coverage_report.txt --verbose - name: Upload feature coverage report if: failure() uses: actions/upload-artifact@v4 with: name: feature-coverage-report - path: feature_coverage_report.txt + path: tmp/feature_coverage_report.txt if-no-files-found: ignore - name: Run requirements coverage audit - run: python3 scripts/audit_requirements_coverage.py --output requirements_coverage_report.txt --verbose + run: python3 scripts/audit_requirements_coverage.py --output tmp/requirements_coverage_report.txt --verbose - name: Upload requirements coverage report if: failure() uses: actions/upload-artifact@v4 with: name: requirements-coverage-report - path: requirements_coverage_report.txt + path: tmp/requirements_coverage_report.txt if-no-files-found: ignore - name: Run link validation - run: python3 scripts/validate_links.py --output validation_report.txt --verbose + run: python3 scripts/validate_links.py --output tmp/validation_report.txt --verbose - name: Upload link validation report if: failure() uses: actions/upload-artifact@v4 with: name: link-validation-report - path: validation_report.txt + path: tmp/validation_report.txt if-no-files-found: ignore - name: Run markdown linting if: always() - run: NODE_OPTIONS="--no-warnings=MODULE_TYPELESS_PACKAGE_JSON" markdownlint-cli2 "**/*.md" + run: NODE_OPTIONS="--no-warnings=MODULE_TYPELESS_PACKAGE_JSON" markdownlint-cli2 "**/*.md" "#tmp/**" diff --git a/.github/workflows/go-ci.yml b/.github/workflows/go-ci.yml index 762b845f..96e99233 100644 --- a/.github/workflows/go-ci.yml +++ b/.github/workflows/go-ci.yml @@ -108,6 +108,12 @@ jobs: - name: Checkout code uses: actions/checkout@v4 + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version: '1.25' + cache-dependency-path: api/go/go.sum + - name: Set up Python uses: actions/setup-python@v5 with: diff --git a/Makefile b/Makefile index e1890349..281dec75 100644 --- a/Makefile +++ b/Makefile @@ -40,8 +40,8 @@ ci-go: ci-go-v1 ci-go-v1: /usr/bin/make -C api/go ci -# Python venv for lint tooling - creates .venv and installs requirements-lint.txt -# Run once (or after adding/updating requirements-lint.txt) so make lint-python uses the venv. +# Python venv for lint tooling - creates .venv and installs scripts/requirements-lint.txt +# Run once (or after adding/updating scripts/requirements-lint.txt) so make lint-python uses the venv. # Usage: make venv venv: @command -v python3 >/dev/null 2>&1 || { \ @@ -50,7 +50,7 @@ venv: } @python3 -m venv .venv @.venv/bin/pip install -q --upgrade pip - @.venv/bin/pip install -q -r requirements-lint.txt + @.venv/bin/pip install -q -r scripts/requirements-lint.txt @echo "Created .venv with lint tooling. Use 'make lint-python' (it will use .venv when present)." # Markdown linting - performs same checks as GitHub Actions workflow @@ -119,30 +119,28 @@ lint-python: LINT_PATHS="scripts"; \ fi; \ if [ -d .venv ]; then PATH="$(CURDIR)/.venv/bin:$$PATH"; export PATH; fi; \ + export PYTHONPATH="$(CURDIR)/scripts"; \ echo "Running flake8 on Python scripts..."; \ - flake8 $$LINT_PATHS --jobs=1; \ + flake8 $$LINT_PATHS --jobs=1; FLAKE8_RESULT=$$?; \ echo "Running pylint on Python scripts..."; \ - pylint --rcfile=.pylintrc $$LINT_PATHS; \ + pylint --rcfile=.pylintrc $$LINT_PATHS; PYLINT_RESULT=$$?; \ echo "Running radon complexity (non-gating)..."; \ radon cc -s -a $$LINT_PATHS || true; \ echo "Running xenon cyclomatic complexity check (fail if any block > C)..."; \ - xenon -b C $$LINT_PATHS; \ + xenon -b C $$LINT_PATHS; XENON_RESULT=$$?; \ echo "Running radon maintainability metrics (non-gating)..."; \ radon mi -s $$LINT_PATHS || true; \ echo "Running radon maintainability check (fail if any module MI rank C)..."; \ TMP_MI=$$(mktemp); \ radon mi -j $$LINT_PATHS -O $$TMP_MI; \ - python3 -c "\ - import sys, json; \ - d=json.load(open(sys.argv[1])); \ - bad=[k for k,v in d.items() if v.get('rank')=='C']; \ - [print('MI rank C (low maintainability):', f) for f in bad]; \ - sys.exit(1 if bad else 0)" $$TMP_MI; \ - MI_RESULT=$$?; rm -f $$TMP_MI; [ $$MI_RESULT -ne 0 ] && exit $$MI_RESULT; \ + python3 -c "import sys, json; d=json.load(open(sys.argv[1])); bad=[k for k,v in d.items() if v.get('rank')=='C']; [print('MI rank C (low maintainability):', f) for f in bad]; sys.exit(1 if bad else 0)" $$TMP_MI; \ + MI_RESULT=$$?; rm -f $$TMP_MI; \ echo "Running vulture unused code detection (non-gating)..."; \ vulture $$LINT_PATHS --min-confidence 80 || true; \ echo "Running bandit security scan (non-gating)..."; \ - bandit -r $$LINT_PATHS --exit-zero + bandit -r $$LINT_PATHS; BANDIT_RESULT=$$?; \ + echo ""; echo "Lint exit codes: flake8=$$FLAKE8_RESULT pylint=$$PYLINT_RESULT xenon=$$XENON_RESULT radon_mi=$$MI_RESULT bandit=$$BANDIT_RESULT"; \ + [ $$FLAKE8_RESULT -ne 0 ] || [ $$PYLINT_RESULT -ne 0 ] || [ $$XENON_RESULT -ne 0 ] || [ $$MI_RESULT -ne 0 ] || [ $$BANDIT_RESULT -ne 0 ] && exit 1; exit 0 # Link validation - validates all internal markdown links and anchors # NOTE: This target must be kept in sync with .github/workflows/docs-check.yml. @@ -212,29 +210,38 @@ apply-heading-corrections: if [ -n "$(VERBOSE)" ]; then ARGS="$$ARGS --verbose"; fi; \ eval python3 scripts/apply_heading_corrections.py $$ARGS -# Generate markdown anchor from heading text -# NOTE: This is a utility script for generating markdown anchors from heading text. +# Generate markdown anchors from markdown headings +# NOTE: This is a utility wrapper for scripts/generate_anchor.py. # Useful for creating links to specific sections in markdown files. # Requires: Python 3 -# Usage: make generate-anchor TEXT="Heading Text" -# - TEXT: Heading text to convert to anchor -# - IMPORTANT: If TEXT contains backticks, you MUST use single quotes: -# make generate-anchor TEXT='Heading with `code` example' -# Using double quotes will cause the shell to process backticks as command -# substitution before make sees them, which will break the heading text. +# Usage: make generate-anchor FILE="path/to/file.md" +# make generate-anchor LINE="path/to/file.md:224" +# - FILE: Print anchors for all headings in the file +# - LINE: Print anchor for the heading at a specific line in the file generate-anchor: @command -v python3 >/dev/null 2>&1 || { \ echo "Error: python3 not found. Install Python 3 to generate anchor."; \ exit 1; \ } - @if [ -z "$(TEXT)" ]; then \ - echo "Error: TEXT is required."; \ + @if [ -z "$(FILE)" ] && [ -z "$(LINE)" ]; then \ + echo "Error: FILE or LINE is required."; \ echo ""; \ - echo "Usage: make generate-anchor TEXT=\"Heading Text\""; \ - echo " make generate-anchor TEXT='Heading with \`code\` example' (use single quotes for backticks)"; \ + echo "Usage: make generate-anchor FILE=\"path/to/file.md\""; \ + echo " make generate-anchor LINE=\"path/to/file.md:224\""; \ exit 1; \ fi - @python3 scripts/generate_anchor.py --text "$(TEXT)" + @if [ -n "$(FILE)" ] && [ -n "$(LINE)" ]; then \ + echo "Error: FILE and LINE are mutually exclusive. Provide only one."; \ + echo ""; \ + echo "Usage: make generate-anchor FILE=\"path/to/file.md\""; \ + echo " make generate-anchor LINE=\"path/to/file.md:224\""; \ + exit 1; \ + fi + @if [ -n "$(LINE)" ]; then \ + python3 scripts/generate_anchor.py --line "$(LINE)"; \ + else \ + python3 scripts/generate_anchor.py --file "$(FILE)"; \ + fi # Requirement reference validation - validates REQ references in feature files # NOTE: This target must be kept in sync with .github/workflows/docs-check.yml. diff --git a/ai_files/ai_coding_instructions.md b/ai_files/ai_coding_instructions.md index f3bb076a..5b6f66d4 100644 --- a/ai_files/ai_coding_instructions.md +++ b/ai_files/ai_coding_instructions.md @@ -41,15 +41,13 @@ The following principles guide all development work on this project. ### 1.5 Shell Quoting with Backticks -- **Critical Rule:** When passing text containing backticks (e.g., markdown headings with inline code like `` `code` ``) to shell commands or Makefile targets, always use single quotes to preserve the backticks. -- **Why:** The shell processes backticks as command substitution before make or scripts see them, which will break the text. -- **Correct Examples:** - - `` make generate-anchor TEXT='Heading with `code` example' `` - - `` python3 scripts/generate_anchor.py --text 'File Management with `Package` type' `` -- **Incorrect Examples:** - - `` make generate-anchor TEXT="Heading with \`code\` example" `` (shell processes backticks before make sees them) - - `` python3 scripts/generate_anchor.py --text "File Management with \`Package\` type" `` (same issue) -- **Applies To:** Any command or script that accepts text containing backticks, including the `generate-anchor` Make target and related scripts. +- **Critical Rule:** Avoid passing markdown headings (especially ones containing backticks like `` `code` ``) as command-line string arguments. +- **Why:** Backticks are command substitution in shells, and quoting/escaping is error-prone for both humans and AI agents. +- **Fallback (when text must be on the command line):** Use single quotes (not double quotes) wherever possible and escape backticks in all other cases. +- **Correct Examples (preferred, file-based):** + - `` make generate-anchor LINE='docs/tech_specs/api_core.md:224' `` + - `` make generate-anchor FILE='docs/tech_specs/api_core.md' `` +- **Applies To:** Any tooling that needs to work with markdown headings containing backticks. ## 2. Development Workflow @@ -294,9 +292,8 @@ Quick reference: - Note: Skipped when PATHS is specified (requires all tech specs) - **`make validate-req-references [VERBOSE=1]`** - Validate requirement references - Note: Skipped when PATHS is specified (requires all feature files) -- **`make generate-anchor TEXT='Heading Text'`** - Generate markdown anchor from heading text - - Use single quotes (see example in section 1.5) - - The script automatically removes backticks and their contents when generating anchors +- **`make generate-anchor FILE='path/to/file.md'`** - Print anchors for all headings in a file +- **`make generate-anchor LINE='path/to/file.md:224'`** - Print anchor for the heading at a specific line in a file - See [Shell Quoting with Backticks](#15-shell-quoting-with-backticks) for details #### Coverage Audit Targets diff --git a/api/go/Makefile b/api/go/Makefile index eb3096d9..083cbea1 100644 --- a/api/go/Makefile +++ b/api/go/Makefile @@ -11,11 +11,11 @@ ifneq ($(MAKE),$(SYSTEM_MAKE)) override MAKE := $(SYSTEM_MAKE) endif -.PHONY: test bdd bdd-ci bdd-domain ci tidy coverage coverage-html coverage-report validate-go-spec-references apply-go-spec-references +.PHONY: test bdd bdd-ci bdd-domain ci tidy go-fmt coverage coverage-html coverage-report validate-go-spec-references apply-go-spec-references BDD_TAGS ?= '~@skip && ~@wip' BDD_DOMAIN ?= '' -BDD_OUTPUT_DIR ?= ../../../tmp +BDD_OUTPUT_DIR ?= ../../tmp BDD_OUTPUT_FILE ?= $(BDD_OUTPUT_DIR)/bdd_test_output_$(shell date +%Y%m%d_%H%M%S).txt # Tidy dependencies - must use bdd build tag to preserve BDD dependencies @@ -24,6 +24,12 @@ tidy: @GOFLAGS=-tags=bdd go mod tidy @echo "Dependencies tidied successfully." +# Format Go source files +go-fmt: + @echo "Running go fmt..." + @go fmt ./... + @echo "Formatting complete." + # Run all unit tests # Set GOCACHE to a writable location if the default cache is not writable # Test actual write capability by attempting to create a test file @@ -132,8 +138,7 @@ lint: @go vet ./... @echo "" @echo "Running golangci-lint..." - @golangci-lint run --exclude-dir=_bdd ./... - @golangci-lint run --build-tags=bdd ./... + @golangci-lint run ./... # Go signature validation - validates Go signatures in implementation against tech specs # NOTE: This target must be kept in sync with .github/workflows/docs-check.yml. diff --git a/scripts/generate_anchor.py b/scripts/generate_anchor.py index 19cba339..ea1d75a8 100755 --- a/scripts/generate_anchor.py +++ b/scripts/generate_anchor.py @@ -1,129 +1,125 @@ #!/usr/bin/env python3 """ -Generate markdown anchor from heading text. +Generate markdown anchors from markdown headings. -This script generates GitHub-style markdown anchors from heading text, -which can be used in markdown links to reference specific sections. +This script generates GitHub-style markdown anchors from markdown headings. +It supports generating anchors for: -Usage: - python3 scripts/generate_anchor.py "Heading Text" - python3 scripts/generate_anchor.py --text "Heading Text" - echo "Heading Text" | python3 scripts/generate_anchor.py +- A specific heading line in a file (via --line) +- All headings in a file (via --file) Options: - --text, -t TEXT Heading text to convert to anchor + --file, -f FILE Markdown file to scan and print anchors for all headings + --line, -l LINE File + line reference in the format: path.md:224 --help, -h Show this help message Examples: - # From command line argument - python3 scripts/generate_anchor.py "1.2.3 AddFile Package Method" - # Output: #123-addfile-package-method - - # From stdin - echo "1.2.3 AddFile Package Method" | python3 scripts/generate_anchor.py - # Output: #123-addfile-package-method - - # Using --text switch - python3 scripts/generate_anchor.py --text "File Management" - # Output: #file-management - - # Via Makefile - make generate-anchor TEXT="1.2.3 AddFile Package Method" - # Output: #123-addfile-package-method - - # Headings with backticks (use single quotes to preserve backticks) - python3 scripts/generate_anchor.py '1.2.3 AddFile with `code` example' - # Output: #123-addfile-with-code-example - - # Headings with backticks via --text (single quotes recommended) - python3 scripts/generate_anchor.py --text 'File Management with `Package` type' - # Output: #file-management-with-package-type - - # Headings with backticks via Makefile (use single quotes) - make generate-anchor TEXT='1.2.3 AddFile with `code` example' - # Output: #123-addfile-with-code-example - -Note: When headings contain backticks (e.g., `code`), use single quotes - around the heading text to preserve the backticks. The script will - automatically remove backticks and their contents when generating - the anchor, as per GitHub markdown anchor generation rules. + # Generate anchor for a specific heading line in a file + python3 scripts/generate_anchor.py --line docs/tech_specs/api_core.md:42 + # Output: #some-heading-anchor + + # Print anchors for all headings in a file + python3 scripts/generate_anchor.py --file docs/tech_specs/api_core.md + # Output (one per heading): + # docs/tech_specs/api_core.md:1: H1 Title => #title """ -import sys import argparse +import sys from pathlib import Path +from typing import Tuple + +from lib._validation_utils import extract_headings_from_file, generate_anchor_from_heading + + +def _parse_line_ref(line_ref: str) -> Tuple[Path, int]: + """ + Parse a file:line reference (e.g., "docs/x.md:224"). + """ + if not line_ref or ":" not in line_ref: + raise ValueError("LINE must be in the format: path.md:224") + + path_str, line_str = line_ref.rsplit(":", 1) + path_str = path_str.strip() + line_str = line_str.strip() + + if not path_str: + raise ValueError("LINE must include a file path before ':'") -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" + try: + line_num = int(line_str) + except ValueError as e: + raise ValueError("LINE must end with an integer line number") from e -# Import shared utilities -if str(scripts_dir) not in sys.path: - sys.path.insert(0, str(scripts_dir)) + if line_num <= 0: + raise ValueError("LINE number must be >= 1") -from lib._validation_utils import ( # noqa: E402 - generate_anchor_from_heading, -) + return Path(path_str), line_num + + +def _generate_anchor_for_line(file_path: Path, line_num: int) -> str: + if not file_path.exists(): + raise FileNotFoundError(f"File not found: {file_path}") + + headings = extract_headings_from_file(file_path, skip_code_blocks=True) + for heading_text, _level, heading_line in headings: + if heading_line == line_num: + return generate_anchor_from_heading(heading_text, include_hash=True) + + raise ValueError( + f"No markdown heading found at {file_path}:{line_num} " + "(note: headings inside code blocks are ignored)" + ) + + +def _print_anchors_for_file(file_path: Path) -> None: + if not file_path.exists(): + raise FileNotFoundError(f"File not found: {file_path}") + + headings = extract_headings_from_file(file_path, skip_code_blocks=True) + for heading_text, level, line_num in headings: + anchor = generate_anchor_from_heading(heading_text, include_hash=True) + print(f"{file_path}:{line_num}: H{level} {heading_text} => {anchor}") def main(): """Main entry point for the script.""" parser = argparse.ArgumentParser( - description='Generate markdown anchor from heading text.', + description='Generate markdown anchors from markdown headings.', formatter_class=argparse.RawDescriptionHelpFormatter, epilog=""" Examples: - %(prog)s "1.2.3 AddFile Package Method" - echo "File Management" | %(prog)s - %(prog)s --text "File Management" - - # Headings with backticks (use single quotes to preserve backticks) - %(prog)s '1.2.3 AddFile with `code` example' - %(prog)s --text 'File Management with `Package` type' - - Note: When headings contain backticks (e.g., `code`), use single quotes - around the heading text to preserve the backticks. The script will - automatically remove backticks and their contents when generating - the anchor, as per GitHub markdown anchor generation rules. + # Generate anchor for a specific heading line in a file + %(prog)s --line docs/tech_specs/api_core.md:42 + + # Print anchors for all headings in a file + %(prog)s --file docs/tech_specs/api_core.md """ ) - parser.add_argument( - '--text', '-t', + group = parser.add_mutually_exclusive_group(required=True) + group.add_argument( + '--file', '-f', type=str, - help='Heading text to convert to anchor (use single quotes if heading contains backticks)' + help='Markdown file to scan and print anchors for all headings' ) - parser.add_argument( - 'heading', - nargs='?', + group.add_argument( + '--line', '-l', type=str, - help=( - 'Heading text to convert to anchor (alternative to --text; ' - 'use single quotes if heading contains backticks)' - ) + help='File + line reference in the format: path.md:224' ) args = parser.parse_args() - # Get heading text from argument, --text switch, or stdin - heading_text = None - if args.text: - heading_text = args.text - elif args.heading: - heading_text = args.heading - else: - # Read from stdin - try: - heading_text = sys.stdin.read().strip() - except (EOFError, KeyboardInterrupt): - parser.print_help() - sys.exit(1) - - if not heading_text: - parser.print_help() - sys.exit(1) - - # Generate and output anchor (with '#' prefix for CLI output) - anchor = generate_anchor_from_heading(heading_text, include_hash=True) - print(anchor) + try: + if args.line: + file_path, line_num = _parse_line_ref(args.line) + anchor = _generate_anchor_for_line(file_path, line_num) + print(anchor) + else: + _print_anchors_for_file(Path(args.file)) + except (OSError, ValueError) as e: + print(f"Error: {e}", file=sys.stderr) + return 1 return 0 From 20a970d6994972cab5db29f442f69100a14d1562 Mon Sep 17 00:00:00 2001 From: Andre Date: Tue, 3 Feb 2026 03:37:24 -0500 Subject: [PATCH 3/7] refactor(scripts): modularize docs validation tooling Split large script utilities into focused modules (validation, go_markdown, heading_numbering, index utils, and spec reference helpers) and update validation entrypoints and docs accordingly. --- scripts/README.md | 151 +- scripts/apply_go_code_blocks_corrections.py | 48 +- scripts/apply_heading_corrections.py | 71 +- scripts/audit_feature_coverage.py | 8 +- scripts/audit_requirements_coverage.py | 128 +- scripts/lib/README.md | 18 +- scripts/lib/_go_code_utils.py | 1930 +------------- scripts/lib/_go_code_utils_test.py | 38 +- scripts/lib/_index_utils.py | 559 +--- scripts/lib/_index_utils_parsing.py | 456 ++++ scripts/lib/_index_utils_rendering.py | 129 + .../lib/_validate_go_code_blocks_report.py | 307 +++ .../_validate_go_signature_sync_helpers.py | 237 ++ .../_validate_go_spec_references_models.py | 79 + ...idate_go_spec_references_section_finder.py | 285 ++ .../_validate_go_spec_references_validator.py | 566 ++++ ...e_go_spec_signature_consistency_helpers.py | 254 ++ .../_validate_heading_numbering_helpers.py | 146 ++ .../lib/_validate_heading_numbering_models.py | 34 + .../lib/_validate_heading_numbering_report.py | 484 ++++ .../_validate_heading_numbering_title_case.py | 241 ++ scripts/lib/_validate_links_helpers.py | 234 ++ scripts/lib/_validation_utils.py | 2284 +---------------- .../_audit_requirements_scan.py | 8 +- .../go_defs_index/_go_defs_index_anchors.py | 633 +++-- .../go_defs_index/_go_defs_index_config.py | 75 +- .../_go_defs_index_descriptions.py | 8 +- .../go_defs_index/_go_defs_index_discovery.py | 376 +-- .../go_defs_index/_go_defs_index_headings.py | 261 +- .../go_defs_index/_go_defs_index_matching.py | 372 ++- .../_go_defs_index_matching_helpers.py | 28 + .../go_defs_index/_go_defs_index_ordering.py | 56 +- .../go_defs_index/_go_defs_index_reporting.py | 279 +- .../go_defs_index/_go_defs_index_scoring.py | 159 ++ .../_go_defs_index_scoring_domain.py | 94 +- .../_go_defs_index_scoring_domain_test.py | 95 + .../_go_defs_index_scoring_rules_core_base.py | 20 +- ...go_defs_index_scoring_rules_core_domain.py | 45 +- .../_go_defs_index_scoring_rules_methods.py | 72 +- .../_go_defs_index_scoring_rules_penalties.py | 159 +- .../_go_defs_index_scoring_rules_sections.py | 192 +- ..._defs_index_scoring_rules_type_keywords.py | 395 +++ .../go_defs_index/_go_defs_index_shared.py | 2 - scripts/lib/go_markdown/__init__.py | 67 + scripts/lib/go_markdown/_base.py | 994 +++++++ scripts/lib/go_markdown/_rest.py | 921 +++++++ scripts/lib/heading_numbering/__init__.py | 23 + scripts/lib/heading_numbering/_checks.py | 283 ++ scripts/lib/validation/__init__.py | 115 + scripts/lib/validation/_core.py | 71 + scripts/lib/validation/_fs.py | 291 +++ scripts/lib/validation/_markdown.py | 970 +++++++ scripts/lib/validation/_output.py | 933 +++++++ scripts/validate_api_go_defs_index.md | 177 +- scripts/validate_api_go_defs_index.py | 143 +- scripts/validate_go_code_blocks.py | 720 ++---- scripts/validate_go_signature_sync.py | 836 +++--- scripts/validate_go_spec_references.py | 1012 +------- .../validate_go_spec_signature_consistency.py | 1381 +++++----- scripts/validate_heading_numbering.py | 1631 ++---------- scripts/validate_links.py | 944 +++---- scripts/validate_req_references.py | 434 ++-- 62 files changed, 13272 insertions(+), 10690 deletions(-) create mode 100644 scripts/lib/_index_utils_parsing.py create mode 100644 scripts/lib/_index_utils_rendering.py create mode 100644 scripts/lib/_validate_go_code_blocks_report.py create mode 100644 scripts/lib/_validate_go_signature_sync_helpers.py create mode 100644 scripts/lib/_validate_go_spec_references_models.py create mode 100644 scripts/lib/_validate_go_spec_references_section_finder.py create mode 100644 scripts/lib/_validate_go_spec_references_validator.py create mode 100644 scripts/lib/_validate_go_spec_signature_consistency_helpers.py create mode 100644 scripts/lib/_validate_heading_numbering_helpers.py create mode 100644 scripts/lib/_validate_heading_numbering_models.py create mode 100644 scripts/lib/_validate_heading_numbering_report.py create mode 100644 scripts/lib/_validate_heading_numbering_title_case.py create mode 100644 scripts/lib/_validate_links_helpers.py create mode 100644 scripts/lib/go_defs_index/_go_defs_index_matching_helpers.py create mode 100644 scripts/lib/go_defs_index/_go_defs_index_scoring_domain_test.py create mode 100644 scripts/lib/go_defs_index/_go_defs_index_scoring_rules_type_keywords.py create mode 100644 scripts/lib/go_markdown/__init__.py create mode 100644 scripts/lib/go_markdown/_base.py create mode 100644 scripts/lib/go_markdown/_rest.py create mode 100644 scripts/lib/heading_numbering/__init__.py create mode 100644 scripts/lib/heading_numbering/_checks.py create mode 100644 scripts/lib/validation/__init__.py create mode 100644 scripts/lib/validation/_core.py create mode 100644 scripts/lib/validation/_fs.py create mode 100644 scripts/lib/validation/_markdown.py create mode 100644 scripts/lib/validation/_output.py diff --git a/scripts/README.md b/scripts/README.md index 08b976c1..e0c0122f 100644 --- a/scripts/README.md +++ b/scripts/README.md @@ -64,12 +64,11 @@ - [Go Code Blocks Validation Requirements](#go-code-blocks-validation-requirements) - [Go Code Blocks Validation Output Example](#go-code-blocks-validation-output-example) - [validate\_api\_go\_defs\_index.py](#validate_api_go_defs_indexpy) - - [apply\_missing\_go\_defs\_index.py](#apply_missing_go_defs_indexpy) - - [Go Definitions Index Fixer Purpose](#go-definitions-index-fixer-purpose) - - [Go Definitions Index Fixer Usage](#go-definitions-index-fixer-usage) - - [Go Definitions Index Fixer Features](#go-definitions-index-fixer-features) - - [Go Definitions Index Fixer Exit Codes](#go-definitions-index-fixer-exit-codes) - - [Go Definitions Index Fixer Requirements](#go-definitions-index-fixer-requirements) + - [Go Definitions Index Validation Purpose](#go-definitions-index-validation-purpose) + - [Go Definitions Index Validation Usage](#go-definitions-index-validation-usage) + - [Go Definitions Index Validation Features](#go-definitions-index-validation-features) + - [Go Definitions Index Validation Exit Codes](#go-definitions-index-validation-exit-codes) + - [Go Definitions Index Validation Requirements](#go-definitions-index-validation-requirements) - [validate\_go\_signature\_sync.py](#validate_go_signature_syncpy) - [Go Signature Sync Validation Purpose](#go-signature-sync-validation-purpose) - [Go Signature Sync Validation Usage](#go-signature-sync-validation-usage) @@ -130,7 +129,7 @@ This directory contains utility scripts for the NovusPack project. ### generate_anchor.py -Generates GitHub-style markdown anchors from heading text. +Generates GitHub-style markdown anchors from markdown headings. #### Anchor Generation Purpose @@ -142,29 +141,18 @@ Generates GitHub-style markdown anchors from heading text. #### Anchor Generation Usage ```bash -# From command line argument -python3 scripts/generate_anchor.py "1.2.3 AddFile Package Method" -# Output: #123-addfile-package-method - -# From stdin -echo "File Management" | python3 scripts/generate_anchor.py -# Output: #file-management - -# Using --text switch -python3 scripts/generate_anchor.py --text "File Management" -# Output: #file-management - -# Via Makefile -make generate-anchor TEXT="1.2.3 AddFile Package Method" -# Output: #123-addfile-package-method - -# Headings with backticks (use single quotes to preserve backticks) -python3 scripts/generate_anchor.py '1.2.3 AddFile with `code` example' -# Output: #123-addfile-with-code-example - -# Headings with backticks via Makefile (use single quotes) -make generate-anchor TEXT='1.2.3 AddFile with `code` example' -# Output: #123-addfile-with-code-example +# Generate anchor for the heading at a specific file line +python3 scripts/generate_anchor.py --line docs/tech_specs/api_core.md:42 +# Output: #some-heading-anchor + +# Print anchors for all headings in a file +python3 scripts/generate_anchor.py --file docs/tech_specs/api_core.md +# Output (one per heading): +# docs/tech_specs/api_core.md:1: H1 Title => #title + +# Via Makefile (preferred) +make generate-anchor LINE="docs/tech_specs/api_core.md:42" +make generate-anchor FILE="docs/tech_specs/api_core.md" ``` #### Anchor Generation Features @@ -182,14 +170,15 @@ make generate-anchor TEXT='1.2.3 AddFile with `code` example' The script is available via Makefile: ```bash -# Generate anchor from heading text -make generate-anchor TEXT="Heading Text" +# Print anchors for all headings in a file +make generate-anchor FILE="docs/tech_specs/api_core.md" -# Headings with backticks (use single quotes) -make generate-anchor TEXT='Heading with `code` example' +# Print anchor for the heading at a specific line in a file +make generate-anchor LINE="docs/tech_specs/api_core.md:42" ``` -**Note:** When headings contain backticks (e.g., `code`), use single quotes around the heading text to preserve the backticks. The script will automatically remove backticks and their contents when generating the anchor, as per GitHub markdown anchor generation rules. +**Note:** This interface avoids passing heading text through the shell, which eliminates quoting issues (including backticks). +If you must pass text containing backticks on the command line for other tooling, prefer single quotes and escape backticks as needed. #### Anchor Generation Requirements @@ -215,8 +204,8 @@ python3 scripts/validate_links.py # With verbose output python3 scripts/validate_links.py --verbose -# Save detailed report to file -python3 scripts/validate_links.py --output report.txt +# Save detailed report to file (use tmp/ for reports) +python3 scripts/validate_links.py --output tmp/validation_report.txt # Check requirements coverage (ensures all requirements reference tech specs) python3 scripts/validate_links.py --check-coverage @@ -396,16 +385,13 @@ python3 scripts/validate_heading_numbering.py --help - Validates sequential numbering within parent sections - Ensures child heading numbers match parent prefixes - Detects duplicate heading titles (excluding numbering) across all levels -- Detects backticks in headings and flags them as formatting errors - - Headings should not contain backticks; use plain text instead - - Suggestions automatically remove backticks while preserving content +- Allows backticks in headings; case inside backticks is not checked for Title Case - Warns about H3+ headings with numbering exceeding 20 (e.g., "### 3.25") - Warns about H4+ headings with single-word titles (e.g., "#### 1.2.3 Title") - Warns about overly-deeply nested headings (H6 and beyond) - Provides line numbers and detailed error messages - Organized reporting by error type: - Organizational heading errors (headings with no content) - - Heading formatting errors (e.g., backticks) - Heading numbering errors (numbering issues) - Sorted headings output shows only headings with numbering errors (excludes duplicate-only errors) - Regex patterns compiled at module level for performance @@ -878,13 +864,16 @@ python3 scripts/validate_go_code_blocks.py --help - Type aliases: `type Name SomeType` (including built-in types) - Validates that each code block has at most one type/interface definition - Checks that each code block is under a different heading +- Validates heading format: definition name and kind word (e.g. `` `Package.Write` Method ``); definition names preferred in backticks; case inside backticks ignored +- Emits `heading_prefer_backticks` warning when a definition name in a heading is not in backticks (suggests corrected heading) +- When only warnings (e.g. heading_prefer_backticks) are found, the script exits 0 - Reports issues with file paths and line numbers - Provides summary statistics and detailed breakdown #### Go Code Blocks Validation Exit Codes -- `0`: All Go code blocks comply with requirements -- `1`: One or more code blocks violate requirements +- `0`: All Go code blocks comply with requirements, or only warnings (e.g. heading_prefer_backticks) were found +- `1`: One or more code blocks violate requirements (errors present) #### Go Code Blocks Validation Integration @@ -925,59 +914,52 @@ Breakdown by issue type: ### validate_api_go_defs_index.py +Validates that all Go API definitions in tech specs are listed in the Go definitions index. + Business logic and usage are documented in [scripts/validate_api_go_defs_index.md](validate_api_go_defs_index.md). Use `make validate-go-defs-index` to run the check. Implementation details are in [`scripts/lib/go_defs_index/`](../scripts/lib/go_defs_index/), including the placement scoring modules used by the validator. -### apply_missing_go_defs_index.py +#### Go Definitions Index Validation Purpose -Applies selected fixes to `api_go_defs_index.md` based on the validation output. +- Ensures every discovered Go API definition in tech specs appears in the index. +- Detects missing index entries, orphaned entries, wrong-section entries, and incorrect link targets. +- Enforces description rules (minimum length and uniqueness). +- Reports low-confidence placements that require manual review. -#### Go Definitions Index Fixer Purpose +#### Go Definitions Index Validation Usage -- Adds missing definitions with high-confidence placement -- Moves entries to the validator-suggested section -- Updates incorrect links to canonical anchors -- Fills missing or short descriptions when a comment-based summary is provided -- Removes orphaned entries (in index but not found in any tech spec) +```bash +# Run the validator (full scan). +make validate-go-defs-index -#### Go Definitions Index Fixer Usage +# Verbose output and write a report file. +make validate-go-defs-index VERBOSE=1 NO_COLOR=1 OUTPUT="tmp/go_defs_index.txt" -```bash -# Capture validator output (use NO_COLOR for stable parsing) -make validate-go-defs-index VERBOSE=1 NO_COLOR=1 > tmp/go_defs_index.txt 2>&1 - -# Apply fixes and re-render all leaf sections (default) -python3 scripts/apply_missing_go_defs_index.py \ - --input tmp/go_defs_index.txt \ - --index-file docs/tech_specs/api_go_defs_index.md - -# Only update sections that changed (skip full reflow) -python3 scripts/apply_missing_go_defs_index.py \ - --input tmp/go_defs_index.txt \ - --index-file docs/tech_specs/api_go_defs_index.md --no-normalize +# Apply high-confidence index updates (interactive confirmation required). +make validate-go-defs-index APPLY=1 ``` -#### Go Definitions Index Fixer Features +#### Go Definitions Index Validation Features -- Parses validator output for actionable fixes -- Uses shared `IndexEntry` model from `scripts/lib` so descriptions are preserved and aligned with the validator -- By default re-renders all leaf sections for consistent blank lines and alphabetical order -- Use `--no-normalize` to only update sections that changed (no full reflow) -- Sorts entries within each section alphabetically by full name -- Removes orphaned entries reported by the validator -- Does not add low-confidence definitions (manual review required) +- Scans `docs/tech_specs/*.md` for ` ```go ` code blocks and extracts types, methods, and functions. +- Builds an expected index tree using confidence-scored placement. +- Compares expected entries with current index entries and reports discrepancies. +- Validates index entry descriptions: + - Missing or too-short descriptions are errors. + - Duplicate description text across entries is an error. +- Emits ordering warnings when existing entries are out of order. -#### Go Definitions Index Fixer Exit Codes +#### Go Definitions Index Validation Exit Codes -- `0`: Fixes applied or no actionable fixes found -- `1`: Input file or index file missing +- `0`: No errors found. +- `1`: Errors found. +- With `NO_FAIL=1`, the validator exits with `0` even when errors are found. -#### Go Definitions Index Fixer Requirements +#### Go Definitions Index Validation Requirements - Python 3.x -- Uses `scripts/lib` (IndexEntry, validation parsing, index section helpers) ### validate_go_signature_sync.py @@ -1577,7 +1559,7 @@ This feature is useful for: **Note:** Some scripts require checking all files to validate properly and will skip when `PATHS` is specified: -- `validate-api-go-defs-index` - requires all tech specs to validate the index +- `validate-go-defs-index` - requires all tech specs to validate the index - `validate-req-references` - requires all feature files to validate references - `audit-feature-coverage` - requires all requirements and feature files to validate coverage - `audit-requirements-coverage` - requires all tech specs and requirements to validate coverage @@ -1644,7 +1626,7 @@ All scripts in this directory should be: | `validate_req_references.py` | `make validate-req-references` | `validate-req-references.yml` | ✅ Active | | `audit_feature_coverage.py` | `make audit-feature-coverage` | `audit-coverage.yml` | ✅ Active | | `audit_requirements_coverage.py` | `make audit-requirements-coverage` | `audit-coverage.yml` | ✅ Active | -| `validate_api_go_defs_index.py` | `make validate-api-go-defs-index` | `docs-check` (via Makefile) | ✅ Active | +| `validate_api_go_defs_index.py` | `make validate-go-defs-index` | `docs-check` (via Makefile) | ✅ Active | | `validate_go_code_blocks.py` | `make validate-go-code-blocks` | `docs-check` (via Makefile) | ✅ Active | | `validate_go_spec_signature_consistency.py` | `make validate-go-spec-signature-consistency` | `docs-check` (via Makefile) | ✅ Active | | `validate_go_signature_sync.py` | `make validate-go-signatures` | `go-ci.yml` | ✅ Active | @@ -1679,8 +1661,11 @@ When adding new scripts: 14. Update this README 15. Ensure script has `--help` option 16. Use proper exit codes (0 = success, 1 = failure) -17. Use `OutputBuilder` from [`scripts/lib/_validation_utils.py`](../scripts/lib/_validation_utils.py) for consistent output formatting -18. Ensure code passes flake8 linting: `make flake8-lint` +17. Use `OutputBuilder` from [`scripts/lib/_validation_utils.py`](../scripts/lib/_validation_utils.py) for consistent output formatting: + - `add_success_message()` when validation passes with no issues. + - `add_failure_message()` when there are errors + - `add_warnings_only_message()` when there are only warnings (exit 0; optional `verbose_hint` for run-with-verbose text). +18. Ensure code passes Python linting: `make lint-python` ## Maintenance @@ -1691,7 +1676,7 @@ When modifying existing scripts: 3. Update this README if usage changes 4. Test locally with `make ` before committing 5. Ensure backward compatibility or update all references -6. Run `make flake8-lint` to ensure code quality standards are met +6. Run `make lint-python` to ensure code quality standards are met 7. Add type hints to new functions or when modifying function signatures 8. Refactor large functions into smaller, focused helper functions when appropriate diff --git a/scripts/apply_go_code_blocks_corrections.py b/scripts/apply_go_code_blocks_corrections.py index fae755e6..cb1deb91 100644 --- a/scripts/apply_go_code_blocks_corrections.py +++ b/scripts/apply_go_code_blocks_corrections.py @@ -35,19 +35,9 @@ import sys from collections import defaultdict from pathlib import Path -from typing import List, Optional +from typing import List, NamedTuple, Optional -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" - -# Import shared utilities -if str(scripts_dir) not in sys.path: - sys.path.insert(0, str(scripts_dir)) - -from lib._go_code_utils import ( # noqa: E402 # pylint: disable=wrong-import-position - find_go_code_blocks, - find_first_definition, -) +from lib._go_code_utils import find_first_definition, find_go_code_blocks # Pattern to match error/warning lines with ANSI color codes stripped @@ -66,22 +56,25 @@ def strip_ansi_codes(text: str) -> str: return ansi_escape.sub('', text) +class _CorrectionData(NamedTuple): + """Data for a single correction (bundles args for pylint R0917).""" + + filepath: str + line_num: int + issue_type: str + message: str + suggestion: Optional[str] = None + + class Correction: """Represents a single correction to apply.""" - def __init__( # pylint: disable=too-many-positional-arguments - self, - filepath: str, - line_num: int, - issue_type: str, - message: str, - suggestion: Optional[str] = None, - ): - self.filepath = filepath - self.line_num = line_num - self.issue_type = issue_type - self.message = message - self.suggestion = suggestion + def __init__(self, data: _CorrectionData): + self.filepath = data.filepath + self.line_num = data.line_num + self.issue_type = data.issue_type + self.message = data.message + self.suggestion = data.suggestion def __repr__(self): return (f"Correction(filepath={self.filepath!r}, line={self.line_num}, " @@ -133,13 +126,14 @@ def parse(self, input_lines: List[str]) -> List[Correction]: if not file_path.exists(): continue - correction = Correction( + data = _CorrectionData( file_path, line_num, issue_type, message, - suggestion + suggestion or None, ) + correction = Correction(data) self.corrections.append(correction) return self.corrections diff --git a/scripts/apply_heading_corrections.py b/scripts/apply_heading_corrections.py index 3e2b86d0..e50c253a 100644 --- a/scripts/apply_heading_corrections.py +++ b/scripts/apply_heading_corrections.py @@ -28,32 +28,32 @@ import sys from collections import defaultdict from pathlib import Path -from typing import List +from typing import List, NamedTuple -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" +from lib._validation_utils import format_issue_message -# Import shared utilities -if str(scripts_dir) not in sys.path: - sys.path.insert(0, str(scripts_dir)) -from lib._validation_utils import ( # noqa: E402 # pylint: disable=wrong-import-position - format_issue_message, -) +class _CorrectionData(NamedTuple): + """Data for a single heading correction.""" + + filepath: str + line_num: int + current_number: str + corrected_number: str + heading_text: str + level: int class Correction: """Represents a single heading correction.""" - def __init__( # pylint: disable=too-many-positional-arguments - self, filepath, line_num, current_number, corrected_number, heading_text, level - ): - self.filepath = filepath - self.line_num = line_num - self.current_number = current_number - self.corrected_number = corrected_number - self.heading_text = heading_text - self.level = level + def __init__(self, data: _CorrectionData): + self.filepath = data.filepath + self.line_num = data.line_num + self.current_number = data.current_number + self.corrected_number = data.corrected_number + self.heading_text = data.heading_text + self.level = data.level def __repr__(self): return (f"Correction(filepath={self.filepath!r}, line={self.line_num}, " @@ -102,24 +102,22 @@ def parse(self, input_lines: List[str]) -> List['Correction']: level = len(heading_prefix) if self.current_file: - correction = Correction( + correction = Correction(_CorrectionData( self.current_file, line_num, current_number, corrected_number, heading_text, - level - ) + level, + )) self.corrections.append(correction) else: warning_msg = format_issue_message( "warning", "Correction without file context", "unknown", - None, - line.strip(), - None, - False + message=line.strip(), + no_color=False, ) print(warning_msg, file=sys.stderr) @@ -164,10 +162,7 @@ def apply_file_corrections(self, filepath, corrections): "error", "File not found", str(filepath), - None, - None, - None, - False + no_color=False, ) print(error_msg, file=sys.stderr) self.failed_count += len(corrections) @@ -197,11 +192,12 @@ def apply_file_corrections(self, filepath, corrections): "error", "Line out of range", str(filepath), - correction.line_num, - (f"Line number {correction.line_num} is out of range " - f"(file has {len(lines)} lines)"), - None, - False + line_num=correction.line_num, + message=( + f"Line number {correction.line_num} is out of range " + f"(file has {len(lines)} lines)" + ), + no_color=False, ) print(error_msg, file=sys.stderr) self.failed_count += 1 @@ -227,10 +223,9 @@ def apply_file_corrections(self, filepath, corrections): "warning", "Could not apply correction", str(filepath), - correction.line_num, - f"Line: {original_line.rstrip()}", - None, - False + line_num=correction.line_num, + message=f"Line: {original_line.rstrip()}", + no_color=False, ) print(warning_msg, file=sys.stderr) self.failed_count += 1 diff --git a/scripts/audit_feature_coverage.py b/scripts/audit_feature_coverage.py index 86ff38c3..5f0b0616 100644 --- a/scripts/audit_feature_coverage.py +++ b/scripts/audit_feature_coverage.py @@ -479,12 +479,12 @@ def _collect_coverage_issues( ) if not count: - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "missing_feature_coverage", req_file, line_num, line_num, - f"Requirement {req_id} has no feature files referencing it", + message=f"Requirement {req_id} has no feature files referencing it", severity='error', req_id=req_id )) @@ -498,12 +498,12 @@ def _collect_coverage_issues( f"Required: {', '.join(sorted(expected_specs))}. " f"Missing: {', '.join(sorted(missing_specs))}" ) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "spec_mismatch", req_file, line_num, line_num, - message, + message=message, severity='error', req_id=req_id, feature_file=str(feature_file), diff --git a/scripts/audit_requirements_coverage.py b/scripts/audit_requirements_coverage.py index ab6e0a58..e6fbf6c6 100644 --- a/scripts/audit_requirements_coverage.py +++ b/scripts/audit_requirements_coverage.py @@ -125,7 +125,8 @@ def _run_index_ref_check( output.add_blank_line("error") for req_file, line_num, reference in index_ref_errors: error_msg = format_issue_message( - "error", "Index file ref", req_file, line_num, reference, no_color + "error", "Index file ref", req_file, + line_num=line_num, message=reference, no_color=no_color ) output.add_error_line(error_msg) output.add_blank_line("error") @@ -165,7 +166,8 @@ def _run_file_level_coverage( if not count: error_msg = format_issue_message( - "error", "Spec without Req", spec_basename, None, "NO REQUIREMENTS", no_color + "error", "Spec without Req", spec_basename, + message="NO REQUIREMENTS", no_color=no_color ) output.add_error_line(error_msg) missing_specs.append(spec_basename) @@ -176,7 +178,8 @@ def _run_file_level_coverage( output.add_errors_header() for spec_basename in missing_specs: error_msg = format_issue_message( - "error", "Spec without Req", spec_basename, None, "NO REQUIREMENTS", no_color + "error", "Spec without Req", spec_basename, + message="NO REQUIREMENTS", no_color=no_color ) output.add_error_line(error_msg) @@ -214,8 +217,9 @@ def _run_heading_coverage( except (IOError, OSError) as e: if args.verbose: output.add_verbose_line(f" Warning: Could not read {spec_basename}: {e}") - issues.append(ValidationIssue( - "file_read_error", spec_relative_path, 0, 0, f"Could not read file: {e}", + issues.append(ValidationIssue.create( + "file_read_error", spec_relative_path, 0, 0, + message=f"Could not read file: {e}", severity='error' )) continue @@ -224,17 +228,19 @@ def _run_heading_coverage( output.add_verbose_line( f" Warning: Could not decode {spec_basename} (encoding issue): {e}" ) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "file_encoding_error", spec_relative_path, 0, 0, - f"Could not decode file (encoding issue): {e}", severity='error' + message=f"Could not decode file (encoding issue): {e}", + severity='error' )) continue except _SCRIPT_ERROR_EXCEPTIONS as e: if args.verbose: output.add_verbose_line(f" Warning: Unexpected error reading {spec_basename}: {e}") - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "unexpected_error", spec_relative_path, 0, 0, - f"Unexpected error reading file: {e}", severity='error' + message=f"Unexpected error reading file: {e}", + severity='error' )) continue @@ -270,14 +276,21 @@ def _process_one_heading( ) except _SCRIPT_ERROR_EXCEPTIONS as e: error_msg = format_issue_message( - "error", "Analysis error", ctx.spec_relative_path, heading.line_num, - f"Failed to extract section content for heading '{heading.heading_text}': {str(e)}", - None, no_color + "error", "Analysis error", ctx.spec_relative_path, + line_num=heading.line_num, + message=( + f"Failed to extract section content for heading " + f"'{heading.heading_text}': {str(e)}" + ), + no_color=no_color ) output.add_error_line(error_msg) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "analysis_error", Path(ctx.spec_relative_path), heading.line_num, heading.line_num, - f"Failed to extract section content for heading '{heading.heading_text}': {str(e)}", + message=( + f"Failed to extract section content for heading " + f"'{heading.heading_text}': {str(e)}" + ), severity='error', heading_level=heading.heading_level, heading_text=heading.heading_text, @@ -293,19 +306,25 @@ def _process_one_heading( heading.heading_level, ctx.headings_for_hierarchy, ctx.hierarchy, - MAX_ORGANIZATIONAL_PROSE_LINES + max_prose_lines=MAX_ORGANIZATIONAL_PROSE_LINES ) except (ValueError, IndexError, KeyError) as e: error_msg = format_issue_message( - "error", "Analysis error", ctx.spec_relative_path, heading.line_num, - f"Failed to check if organizational for heading '{heading.heading_text}': {str(e)}", - None, - no_color + "error", "Analysis error", ctx.spec_relative_path, + line_num=heading.line_num, + message=( + f"Failed to check if organizational for heading " + f"'{heading.heading_text}': {str(e)}" + ), + no_color=no_color ) output.add_error_line(error_msg) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "analysis_error", Path(ctx.spec_relative_path), heading.line_num, heading.line_num, - f"Failed to check if organizational for heading '{heading.heading_text}': {str(e)}", + message=( + f"Failed to check if organizational for heading " + f"'{heading.heading_text}': {str(e)}" + ), severity='error', heading_level=heading.heading_level, heading_text=heading.heading_text, @@ -314,18 +333,21 @@ def _process_one_heading( return (issues, excluded) except _SCRIPT_ERROR_EXCEPTIONS as e: error_msg = format_issue_message( - "error", "Analysis error", ctx.spec_relative_path, heading.line_num, - ( + "error", "Analysis error", ctx.spec_relative_path, + line_num=heading.line_num, + message=( "Unexpected error checking organizational heading " f"'{heading.heading_text}': {str(e)}" ), - None, - no_color + no_color=no_color ) output.add_error_line(error_msg) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "analysis_error", Path(ctx.spec_relative_path), heading.line_num, heading.line_num, - f"Unexpected error checking organizational heading '{heading.heading_text}': {str(e)}", + message=( + f"Unexpected error checking organizational heading " + f"'{heading.heading_text}': {str(e)}" + ), severity='error', heading_level=heading.heading_level, heading_text=heading.heading_text, @@ -347,10 +369,11 @@ def _process_one_heading( content_note = ( " (no direct content)" if org_result['is_empty'] else " (minor content)" ) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "organizational_heading", Path(ctx.spec_relative_path), heading.line_num, heading.line_num, - f"{heading.heading_text} (#{heading.anchor}){content_note}", severity='warning', + message=f"{heading.heading_text} (#{heading.anchor}){content_note}", + severity='warning', heading_level=heading.heading_level, heading_text=heading.heading_text, anchor=heading.anchor, @@ -369,13 +392,15 @@ def _process_one_heading( ) except _SCRIPT_ERROR_EXCEPTIONS as e: error_msg = format_issue_message( - "error", "Analysis error", ctx.spec_relative_path, heading.line_num, - f"Failed to classify heading '{heading.heading_text}': {str(e)}", None, no_color + "error", "Analysis error", ctx.spec_relative_path, + line_num=heading.line_num, + message=f"Failed to classify heading '{heading.heading_text}': {str(e)}", + no_color=no_color ) output.add_error_line(error_msg) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "analysis_error", Path(ctx.spec_relative_path), heading.line_num, heading.line_num, - f"Failed to extract section content for heading '{heading.heading_text}': {str(e)}", + message=f"Failed to classify heading '{heading.heading_text}': {str(e)}", severity='error', heading_level=heading.heading_level, heading_text=heading.heading_text, @@ -410,10 +435,10 @@ def _process_one_heading( ): return (issues, excluded) severity = classification['severity_if_missing'] - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "missing_requirement", Path(ctx.spec_relative_path), heading.line_num, heading.line_num, - f"{heading.heading_text} (#{heading.anchor}) - {classification['reason']}", + message=f"{heading.heading_text} (#{heading.anchor}) - {classification['reason']}", severity=severity, heading_level=heading.heading_level, heading_text=heading.heading_text, @@ -519,9 +544,10 @@ def _handle_classification_result( if kind == 'architectural': return ( - ValidationIssue( + ValidationIssue.create( "architectural_heading", spec_path, line_num, line_num, - f"{heading_text} (architectural: {reason})", severity='warning', + message=f"{heading_text} (architectural: {reason})", + severity='warning', heading=heading_text, heading_level=heading_level, anchor=anchor, reason=reason ), excluded_pairs, @@ -529,9 +555,10 @@ def _handle_classification_result( ) if kind == 'signature_only': return ( - ValidationIssue( + ValidationIssue.create( "signature_only_heading", spec_path, line_num, line_num, - f"{heading_text} (#{anchor}) - {reason}", severity='warning', + message=f"{heading_text} (#{anchor}) - {reason}", + severity='warning', heading_level=heading_level, heading_text=heading_text, anchor=anchor, reason=reason ), excluded_pairs, @@ -539,9 +566,10 @@ def _handle_classification_result( ) if kind == 'example_only': return ( - ValidationIssue( + ValidationIssue.create( "example_only_heading", spec_path, line_num, line_num, - f"{heading_text} (#{anchor}) - {reason}", severity='warning', + message=f"{heading_text} (#{anchor}) - {reason}", + severity='warning', heading_level=heading_level, heading_text=heading_text, anchor=anchor, reason=reason ), excluded_pairs, @@ -549,9 +577,10 @@ def _handle_classification_result( ) if kind == 'non_prose': return ( - ValidationIssue( + ValidationIssue.create( "non_prose_heading", spec_path, line_num, line_num, - f"{heading_text} (#{anchor}) - {reason}", severity='warning', + message=f"{heading_text} (#{anchor}) - {reason}", + severity='warning', heading_level=heading_level, heading_text=heading_text, anchor=anchor, reason=reason ), excluded_pairs, @@ -590,11 +619,13 @@ def _run_requirement_refs_to_excluded( req_path, file_cache ): if (spec_basename, anchor) in excluded_headings_set: - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "requirement_references_excluded_heading", Path(req_relative), line_num, line_num, - f"references excluded heading {spec_basename}#{anchor} - " - "requirement may be out of date or overly implementation-specific", + message=( + f"references excluded heading {spec_basename}#{anchor} - " + "requirement may be out of date or overly implementation-specific" + ), severity='warning', spec_basename=spec_basename, anchor=anchor )) except _SCRIPT_ERROR_EXCEPTIONS as e: @@ -756,9 +787,8 @@ def _has_errors(counters: dict) -> bool: ) if _has_warnings_only(counters) and total_headings > 0: - output.add_line( - "Note: Warnings are shown above, but they do not cause the audit to fail.", - section="final_message" + output.add_warnings_only_message( + message="Note: Warnings are shown above, but they do not cause the audit to fail.", ) elif total_headings > 0: output.add_success_message("All H2+ headings have requirement coverage!") diff --git a/scripts/lib/README.md b/scripts/lib/README.md index 067208c0..be00d36c 100644 --- a/scripts/lib/README.md +++ b/scripts/lib/README.md @@ -4,6 +4,7 @@ - [2. Shared Utilities](#2-shared-utilities) - [2.1 Go Code Utils Module](#21-go-code-utils-module) - [2.2 Validation Utils Module](#22-validation-utils-module) + - [2.3 Heading Numbering Module](#23-heading-numbering-module) - [3. Go Definitions Index Subsystem](#3-go-definitions-index-subsystem) - [4. Requirements Coverage Audit Subsystem](#4-requirements-coverage-audit-subsystem) - [5. Tests](#5-tests) @@ -50,10 +51,11 @@ Used by: Shared utilities for validation scripts, including output formatting, error reporting, and common helpers. Location: [`scripts/lib/_validation_utils.py`](../lib/_validation_utils.py). +Implementation lives in [`scripts/lib/validation/`](../lib/validation/) (`_core`, `_output`, `_fs`, `_markdown`); `_validation_utils` re-exports from lib.validation for backward compatibility. Core functionality: -- `OutputBuilder` - Class for building formatted validation output. +- `OutputBuilder` - Class for building formatted validation output; use `add_success_message()`, `add_failure_message()`, or `add_warnings_only_message()` for the final message (success, errors, or warnings-only). - `ValidationIssue` - Unified class for representing validation errors and warnings. - `get_workspace_root()` - Find repository root directory. - `find_markdown_files()` - Find markdown files in specified paths. @@ -92,6 +94,20 @@ Used by: - [`scripts/validate_links.py`](../validate_links.py) - [`scripts/validate_req_references.py`](../validate_req_references.py) +### 2.3 Heading Numbering Module + +Modules used by `validate_heading_numbering.py` for heading structure and numbering checks. + +Location: [`scripts/lib/heading_numbering/`](../lib/heading_numbering/). + +Core functionality: + +- **\_checks.py** – Check functions: `check_organizational_headings`, `check_heading_capitalization`, `check_h2_period_consistency`, `check_duplicate_headings`, `check_excessive_numbering`, `check_single_word_headings`. + +Used by: + +- [`scripts/validate_heading_numbering.py`](../validate_heading_numbering.py) + ## 3. Go Definitions Index Subsystem Modules in [`scripts/lib/go_defs_index/`](../lib/go_defs_index/) implement the phased pipeline for `validate_api_go_defs_index.py`. diff --git a/scripts/lib/_go_code_utils.py b/scripts/lib/_go_code_utils.py index 7a9514db..e45b22c7 100644 --- a/scripts/lib/_go_code_utils.py +++ b/scripts/lib/_go_code_utils.py @@ -7,1877 +7,67 @@ - Parsing Go function, method, and type signatures - Normalizing Go signatures and type names - Detecting example code (single lines and entire code blocks) -""" - -import re -import sys -from pathlib import Path -from typing import List, Tuple, Optional, Dict -from dataclasses import dataclass - -# Import heading utility from validation_utils to avoid duplication -lib_dir = Path(__file__).parent -scripts_dir = lib_dir.parent -if str(scripts_dir) not in sys.path: - sys.path.insert(0, str(scripts_dir)) - -from lib._validation_utils import find_heading_for_code_block # noqa: E402 - -# Example detection markers -EXAMPLE_MARKERS = [ - 'hypothetical', 'not the actual', 'this is not', 'not a real', - 'example only', 'example type', 'example interface', 'example struct', - 'example version', 'example pattern', 'illustration only', - "not an actual", "shown for illustration" -] - -EXAMPLE_NAME_PREFIXES = ('Example', 'Hypothetical', 'Mock', 'Test') +Facade: re-exports from lib.go_markdown for backward compatibility. +Implementation lives in scripts/lib/go_markdown/ (_base, _rest). +""" -# Compiled regex patterns for performance -_RE_TYPE_NAME = re.compile(r'^\s*type\s+(\w+)') -_RE_FUNC_NAME = re.compile(r'^\s*func\s+(?:\([^)]+\)\s+)?(\w+)') -_RE_GO_COMMENT_SINGLE = re.compile(r'//.*$') -_RE_GO_COMMENT_SINGLE_MULTILINE = re.compile(r'//.*$', re.MULTILINE) -_RE_GO_COMMENT_MULTI = re.compile(r'/\*.*?\*/', flags=re.DOTALL) -_RE_GO_DOC_LINE = re.compile(r'^\s*//\s?(.*)$') -_RE_GO_BLOCK_COMMENT_START = re.compile(r'^\s*/\*\s?(.*)$') -_RE_GO_BLOCK_COMMENT_END = re.compile(r'^(.*)\*/\s*$') -_RE_INTERFACE_PATTERN = re.compile(r'^\s*(?:type\s+)?(\w+)(?:\s*\[[^\]]+\])?\s+interface\s*\{') -_RE_STRUCT_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+struct\s*\{') -_RE_ALIAS_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s*=\s') -_RE_POINTER_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+\*') -_RE_SLICE_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+\[\]') -_RE_MAP_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+map\s*\[') -_RE_TYPE_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+\S') -_RE_AFTER_TYPE_PATTERN = re.compile(r'^\s*type\s+\w+(?:\s*\[[^\]]+\])?\s+(.+)') -_RE_REMOVE_BRACE = re.compile(r'\s*\{.*$') -_RE_METHOD_PATTERN = re.compile( - r'^\s*func\s+(\([^)]+\))\s+(\w+)(?:\s*\[[^\]]+\])?\s*\(([^)]*)\)\s*(.*)$' +from lib.go_markdown import ( + EXAMPLE_MARKERS, + EXAMPLE_NAME_PREFIXES, + InterfaceParser, + Signature, + check_kind_word_after, + count_go_definitions, + determine_type_kind, + extract_go_doc_comment_above, + extract_interfaces_from_go_file, + extract_interfaces_from_markdown, + extract_receiver_type, + find_definition_line_index, + find_first_definition, + find_go_code_blocks, + is_continuation_line, + is_definition_start_line, + is_example_code, + is_example_definition, + is_example_signature_name, + is_in_go_code_block, + is_public_name, + is_signature_only_code_block, + normalize_generic_name, + normalize_go_signature, + normalize_go_signature_with_params, + parse_go_def_signature, + remove_go_comments, ) -_RE_FUNC_PATTERN = re.compile(r'^\s*func\s+(\w+)(?:\s*\[[^\]]+\])?\s*\(([^)]*)\)\s*(.*)$') -_RE_RECEIVER_TYPE = re.compile(r'^\s*(?:\w+\s+)?(?:\*)?\s*(\w+(?:\[[^\]]+\])?)') -_RE_WHITESPACE = re.compile(r'\s+') -_RE_GENERIC_PARAMS = re.compile(r'\[[^\]]+\]') -_RE_PACKAGE_TYPE = re.compile(r'\b([a-z][a-z0-9_]*(?:\.[a-z][a-z0-9_]*)*)\.([A-Z][A-Za-z0-9_]*)\b') -_RE_METHOD_NORMALIZE = re.compile(r'func\s+(\([^)]+\))\s+(\w+)\s*\(([^)]*)\)\s*(.*)$') -_RE_FUNC_NORMALIZE = re.compile(r'func\s+(\w+)\s*\(([^)]*)\)\s*(.*)$') -_RE_FUNC_WITH_PARAMS = re.compile(r'func\s+(?:\([^)]+\)\s+)?(\w+)\s*\(([^)]*)\)\s*(.*)$') -_RE_RECEIVER_MATCH = re.compile(r'func\s+(\([^)]+\))\s+') -_RE_WHITESPACE_NORMALIZE = re.compile(r'\s+') -_RE_GENERICS_TAG = re.compile(r'\bgenerics\.(Tag|TagValueType|PathEntry)\b') -_RE_METADATA_TYPES = re.compile( - r'\bmetadata\.(PackageMetadata|PackageInfo|FileEntry|PathMetadataEntry|ProcessingState)\b' -) -_RE_FILEFORMAT_TYPES = re.compile(r'\bfileformat\.(PackageHeader|FileIndex|IndexEntry)\b') -_RE_HEADER_TYPE = re.compile(r'\bHeader\b') -_RE_PKGERRORS_TYPES = re.compile(r'\bpkgerrors\.(ErrorType|PackageError)\b') -_RE_SIGNATURES_TYPES = re.compile(r'\bsignatures\.(Signature|SignatureInfo)\b') -_RE_FUNC_TYPE_DEF = re.compile(r'^\s*type\s+\w+(?:\s*\[[^\]]+\])?\s+func\s*\(') - - -def find_go_code_blocks(content: str) -> List[Tuple[int, int, str]]: - """ - Find all Go code blocks in markdown content. - - Args: - content: Markdown content as string - - Returns: - List of tuples: (start_line, end_line, code_content) - Lines are 1-indexed. - """ - go_blocks = [] - lines = content.split('\n') - - i = 0 - while i < len(lines): - line = lines[i] - - # Check for Go code block start - if line.strip() == '```go': - start_line = i + 1 # 1-indexed for reporting - code_lines = [] - i += 1 - - # Collect code until closing ``` - while i < len(lines) and lines[i].strip() != '```': - code_lines.append(lines[i]) - i += 1 - - if i < len(lines): # Found closing ``` - code_content = '\n'.join(code_lines) - go_blocks.append((start_line, i + 1, code_content)) - - i += 1 - - return go_blocks - - -def is_in_go_code_block(content: str, line_num: int) -> bool: - """ - Check if a given line number is inside a Go code block. - - Args: - content: Markdown content as string - line_num: Line number to check (1-indexed) - - Returns: - True if the line is inside a ```go code block - """ - lines = content.split('\n') - in_go_block = False - - for i, line in enumerate(lines[:line_num], 1): - if line.strip() == '```go': - in_go_block = True - elif line.strip() == '```' and in_go_block: - in_go_block = False - - return in_go_block - - -def is_example_code( - code: str, - start_line: int, - content: Optional[str] = None, - lines: Optional[List[str]] = None, - heading_text: Optional[str] = None, - auto_find_heading: bool = False, - check_prose_before_block: bool = True, - check_single_line: Optional[int] = None, - max_lines_to_check: int = 5 -) -> bool: - """ - Check if Go code is example code. - - This unified function can check: - - A single line within a code block - - Multiple lines in a code block (default: first 5 lines) - - An entire code block - - Looks for example markers in: - - The heading above the code block (if provided or auto-found) - - Prose text immediately before the code block (if check_prose_before_block is True) - - Previous lines within the code block - - The name of the type/function definition - - Args: - code: The code block content (without ```go markers) OR a single line - start_line: Line number where the code block starts (1-indexed) - content: Full markdown content (preferred - used for heading finding and prose checking) - lines: All lines of the file as a list - (alternative to content; content will be derived if needed) - heading_text: Optional heading text above the code block - auto_find_heading: If True, automatically find heading from content - check_prose_before_block: If True, check prose text between heading and code block - check_single_line: If provided (0-indexed line number within code), check only that line - max_lines_to_check: Maximum number of lines to check in code block (default: 5) - - Returns: - True if the code appears to be example code - """ - # Derive content from lines if needed (for heading finding) - if content is None and lines is not None: - content = '\n'.join(lines) - - # Derive lines from content if needed (for prose checking) - if lines is None: - if content: - lines = content.split('\n') - else: - # If we have code but no content/lines, create minimal lines list - code_lines = code.split('\n') - lines = [''] * (start_line - 1) + code_lines - - # Handle auto-finding heading (requires content) - if auto_find_heading and content: - heading_text = find_heading_for_code_block(content, start_line) - - if not lines: - return False - - # Check heading for example indicators (highest priority) - done once per code block - if heading_text: - heading_lower = heading_text.lower() - if any(marker in heading_lower for marker in EXAMPLE_MARKERS): - return True - # Also check for standalone "example" word in heading - if "example" in heading_lower: - return True - - # Check prose text immediately before the code block (between heading and code start) - # Done once per code block - if check_prose_before_block and start_line > 1: - # Look back up to 10 lines before the code block for example indicators - prose_start = max(0, start_line - 11) # 0-indexed - prose_end = start_line - 1 # 0-indexed, exclusive - for j in range(prose_start, prose_end): - if j < 0 or j >= len(lines): - continue - prose_line = lines[j] - # Skip empty lines and markdown code block markers - if not prose_line.strip() or prose_line.strip() in ('```', '```go'): - continue - # Skip markdown headings (they're already checked) - if prose_line.strip().startswith('#'): - continue - prose_lower = prose_line.lower() - if any(marker in prose_lower for marker in EXAMPLE_MARKERS): - return True - # Also check for standalone "example" word in prose, - # but only if it's clearly an example marker - # Skip if it's just "for example" or "example:" in normal prose - if "example" in prose_lower: - # Only flag if it's clearly an example marker, not just "for example" in prose - example_phrases = [ - "this is an example", "example:", "example code", "example only", - "example type", "example interface" - ] - if any(phrase in prose_lower for phrase in example_phrases): - return True - # Skip common phrases that use "example" but aren't marking example code - example_markers = ["example code", "example type", "example interface"] - if "for example" in prose_lower and not any( - marker in prose_lower for marker in example_markers - ): - continue - - # Determine what lines to check - if check_single_line is not None: - # Check single line within code block - line_index = start_line - 1 + check_single_line # Convert to 0-indexed - if line_index < 0 or line_index >= len(lines): - return False - lines_to_check = [line_index] - else: - # Check multiple lines (code block) - code_lines = code.split('\n') - if not code_lines: - return False - - # Check the first few lines of the code block for example markers - lines_to_check = [] - for i, code_line in enumerate(code_lines[:max_lines_to_check]): - if code_line.strip() and not code_line.strip().startswith('```'): - line_idx = start_line - 1 + i # Convert to 0-indexed - if line_idx < len(lines): - lines_to_check.append(line_idx) - - # Check each line for example indicators - for line_index in lines_to_check: - if line_index < start_line - 1: - continue - - # Check previous lines in this code block for example indicators - for j in range(max(start_line - 1, line_index - 5), line_index): - if j < 0 or j >= len(lines): - continue - prev_line = lines[j] - prev_lower = prev_line.lower() - if any(marker in prev_lower for marker in EXAMPLE_MARKERS): - return True - # Also check for standalone "example" word in code comments, but be more specific - if "example" in prev_lower: - # Only flag if it's clearly an example marker in comments - comment_example_phrases = [ - "// example", "// example:", "example code", "example only" - ] - if any(phrase in prev_lower for phrase in comment_example_phrases): - return True - # Skip if it's just "for example" in a comment - if "for example" in prev_lower: - continue - - # Check if type/function name indicates it's an example - line = lines[line_index] if line_index < len(lines) else '' - stripped = line.strip() - - # Check for type definitions - type_name_match = _RE_TYPE_NAME.match(stripped) - if type_name_match: - type_name = type_name_match.group(1) - if type_name.startswith(EXAMPLE_NAME_PREFIXES): - return True - - # Check for function/method definitions - func_name_match = _RE_FUNC_NAME.match(stripped) - if func_name_match: - func_name = func_name_match.group(1) - if func_name.startswith(EXAMPLE_NAME_PREFIXES): - return True - - return False - - -def is_example_signature_name(name: str) -> bool: - """ - Check if a signature name indicates it's an example. - - Args: - name: Signature name to check - - Returns: - True if the name starts with example prefixes - """ - return name.startswith(EXAMPLE_NAME_PREFIXES) - - -def remove_go_comments(text: str, multiline: bool = False) -> str: - """ - Remove Go comments from text (single or multi-line). - - Args: - text: Go code text - multiline: If True, handles multi-line strings and block comments. - If False, also strips whitespace (for single-line usage). - - Returns: - Text with comments removed (and stripped if multiline=False) - """ - if multiline: - text = _RE_GO_COMMENT_SINGLE_MULTILINE.sub('', text) - text = _RE_GO_COMMENT_MULTI.sub('', text) - else: - text = _RE_GO_COMMENT_SINGLE.sub('', text) - # For single-line usage, strip whitespace (matches original behavior) - text = text.strip() - return text - - -def extract_go_doc_comment_above( - code_lines: List[str], - definition_line_index: int, -) -> str: - """ - Extract doc comment text immediately above a definition line. - - This is an additive helper intended for future scoring improvements. - - Args: - code_lines: List of Go code lines (no markdown fences). - definition_line_index: 0-based index of the definition line within code_lines. - - Returns: - Normalized doc comment text, or empty string if none. - """ - if not code_lines: - return "" - if definition_line_index <= 0 or definition_line_index > len(code_lines) - 1: - return "" - - # Walk upward collecting contiguous comment lines / blocks. - collected: List[str] = [] - i = definition_line_index - 1 - - def _should_skip_doc_line(text: str) -> bool: - # Skip TODO/FIXME lines, but keep other doc comment content. - t = (text or "").strip() - if not t: - return True - upper = t.upper() - return upper.startswith("TODO:") or upper.startswith("FIXME:") - - while i >= 0: - raw = code_lines[i].rstrip("\n") - stripped = raw.strip() - - if not stripped: - # Allow blank lines between comment lines but stop if we already started - # collecting and then hit a blank line (doc comments must be adjacent). - if collected: - break - i -= 1 - continue - - # Single-line doc comment: // ... - m = _RE_GO_DOC_LINE.match(raw) - if m: - text = m.group(1).strip() - if text and not _should_skip_doc_line(text): - collected.insert(0, text) - i -= 1 - continue - - # Inline block comment: /* ... */ - if "/*" in stripped and "*/" in stripped: - inner = _RE_GO_COMMENT_MULTI.sub(lambda mm: mm.group(0)[2:-2], stripped) - inner = inner.strip() - if inner and not _should_skip_doc_line(inner): - collected.insert(0, inner) - i -= 1 - continue - - # Multi-line block comment ending on this line: ... */ - end_match = _RE_GO_BLOCK_COMMENT_END.match(stripped) - if end_match and "/*" not in stripped: - block_parts: List[str] = [] - end_text = end_match.group(1).strip() - if end_text and not _should_skip_doc_line(end_text): - block_parts.insert(0, end_text) - - i -= 1 - while i >= 0: - raw2 = code_lines[i].rstrip("\n") - stripped2 = raw2.strip() - start_match = _RE_GO_BLOCK_COMMENT_START.match(stripped2) - if start_match: - start_text = start_match.group(1).strip() - if start_text and start_text != "*/": - cleaned = start_text.replace("*/", "").strip() - if cleaned and not _should_skip_doc_line(cleaned): - block_parts.insert(0, cleaned) - break - - if stripped2: - # Strip leading "*" for common block comment style. - if stripped2.startswith("*"): - stripped2 = stripped2[1:].strip() - cleaned = stripped2.replace("*/", "").strip() - if cleaned and not _should_skip_doc_line(cleaned): - block_parts.insert(0, cleaned) - i -= 1 - - block_text = " ".join([p for p in block_parts if p]).strip() - if block_text: - collected.insert(0, block_text) - i -= 1 - continue - - # Not a comment line; stop. - break - - # Normalize whitespace. - out = " ".join(collected).strip() - out = _RE_WHITESPACE_NORMALIZE.sub(" ", out) - return out - - -def determine_type_kind(line: str) -> Optional[str]: - """ - Determine the kind of a Go type definition from a line. - - This function extracts the kind ('interface', 'struct', 'alias', or 'type') from a Go type - definition line. It checks interfaces first, then structs, then type aliases, then other types. - - Args: - line: Line of Go code - - Returns: - 'interface', 'struct', 'alias', 'pointer', 'slice', 'map', 'type', - or None if not a type definition - - Examples: - - "type Package interface {" -> 'interface' - - "type FileEntry struct {" -> 'struct' - - "type ProcessingState uint8" -> 'type' - - "type Option[T] struct {" -> 'struct' - - "type Name = SomeType" -> 'alias' (type alias) - - "type Name[T] = SomeType[T]" -> 'alias' (generic type alias) - - "type Name *SomeType" -> 'pointer' (pointer type) - - "type Name []SomeType" -> 'slice' (slice type) - - "type Name map[K]V" -> 'map' (map type) - - "type Name SomeType" -> 'type' (regular type definition) - """ - line_clean = remove_go_comments(line) - - # Check for interface definitions FIRST (before type definitions) - # This ensures interfaces are correctly classified, not as types - # Pattern: type Name interface { or Name interface { - interface_match = _RE_INTERFACE_PATTERN.match(line_clean) - if interface_match: - return 'interface' - - # Check for struct definitions (distinct from other types) - # Pattern: type Name struct { or type Name[T] struct { - struct_match = _RE_STRUCT_PATTERN.match(line_clean) - if struct_match: - return 'struct' - - # Check for type aliases: type Name = Type or type Name[T] = Type - # This must be checked before other type definitions - alias_match = _RE_ALIAS_PATTERN.match(line_clean) - if alias_match: - return 'alias' - - # Check for pointer types: type Name *SomeType or type Name[T] *SomeType - pointer_match = _RE_POINTER_PATTERN.match(line_clean) - if pointer_match: - return 'pointer' - - # Check for slice types: type Name []SomeType or type Name[T] []SomeType - slice_match = _RE_SLICE_PATTERN.match(line_clean) - if slice_match: - return 'slice' - - # Check for map types: type Name map[K]V or type Name[T] map[K]V - map_match = _RE_MAP_PATTERN.match(line_clean) - if map_match: - return 'map' - - # Check for other type definitions - # (custom types, etc. - excludes structs, interfaces, aliases, pointers, slices, maps) - # Pattern: type Name SomeType or type Name[T] SomeType - # This handles regular type definitions (may or may not have generics) - type_match = _RE_TYPE_PATTERN.match(line_clean) - if type_match: - # Make sure it's not already matched by struct/interface/alias/pointer/slice/map patterns - # and it's not a function type - # Check that it doesn't start with pointer, slice, or map patterns - if ('struct' not in line_clean and 'interface' not in line_clean - and '=' not in line_clean and 'func(' not in line_clean): - # Check if it's a pointer, slice, or map (already handled above) - # by checking if the pattern after type name matches those - after_type_match = _RE_AFTER_TYPE_PATTERN.match(line_clean) - if after_type_match: - after_type = after_type_match.group(1).strip() - # If it doesn't start with *, [], or map[, it's a regular type - if (not after_type.startswith('*') - and not after_type.startswith('[]') - and not after_type.startswith('map[')): - return 'type' - - return None - - -def parse_go_def_signature(line: str, location: str = "") -> Optional[Signature]: - """ - Parse a Go definition signature from a line (function, method, or type). - - Args: - line: Line of Go code - location: Optional location string (file path and line number) - - Returns: - Signature object or None if no definition found - - For functions/methods: kind='func' or 'method', includes params and returns - - For types: kind='type', 'interface', 'struct', etc., includes generic_params - """ - line_clean = remove_go_comments(line) - - # Try to parse as function/method first - # Remove opening brace if present - line_no_brace = _RE_REMOVE_BRACE.sub('', line_clean).strip() - - # Method: func (r *Receiver) Name(params) returns - method_match = _RE_METHOD_PATTERN.match(line_no_brace) - if method_match: - receiver_str = method_match.group(1) - name = method_match.group(2) - params = method_match.group(3) - returns = method_match.group(4).strip() - receiver_type = extract_receiver_type(receiver_str) - return Signature( - name=name, - kind='method', - receiver=receiver_type, - params=params, - returns=returns, - location=location, - is_public=is_public_name(name) - ) - - # Function: func Name(params) returns - func_match = _RE_FUNC_PATTERN.match(line_no_brace) - if func_match: - name = func_match.group(1) - params = func_match.group(2) - returns = func_match.group(3).strip() - return Signature( - name=name, - kind='func', - params=params, - returns=returns, - location=location, - is_public=is_public_name(name) - ) - - # Try to parse as type definition - kind = determine_type_kind(line_clean) - if kind is not None: - # Special-case: interfaces may be written as: - # - type Name interface { ... } - # - Name interface { ... } - # - # determine_type_kind() supports both forms, but the generic type match below - # only matches "type Name ...", so handle interface explicitly. - if kind == 'interface': - interface_match = re.match( - r'^\s*(?:type\s+)?(\w+)(?:\s*(\[[^\]]+\]))?\s+interface\s*\{', - line_clean, - ) - if interface_match: - name = interface_match.group(1) - generic_params = interface_match.group(2) # e.g., "[T any]" - return Signature( - name=name, - kind='interface', - generic_params=generic_params, - location=location, - is_public=is_public_name(name), - ) - - # Extract name and generic parameters - type_match = re.match( - r'^\s*type\s+(\w+)(?:\s*(\[[^\]]+\]))?\s+', - line_clean - ) - if type_match: - name = type_match.group(1) - generic_params = type_match.group(2) # e.g., "[T any]" - return Signature( - name=name, - kind=kind, # 'type', 'interface', 'struct', 'alias', etc. - generic_params=generic_params, - location=location, - is_public=is_public_name(name) - ) - - return None - - -def extract_receiver_type(receiver_str: str, normalize_generics: bool = False) -> str: - """ - Extract the type name from a receiver string. - - Args: - receiver_str: Receiver string like "(r *Receiver)" or "(o *Option[T])" or just "Package" - normalize_generics: If True, remove generic parameters from the type name - - Returns: - Type name (e.g., "Receiver" or "Option") - """ - # If already just a type name (starts with capital, no parentheses), return as-is - if receiver_str and receiver_str[0].isupper() and '(' not in receiver_str: - if normalize_generics: - return normalize_generic_name(receiver_str) - return receiver_str - - # Remove parentheses if present - receiver_clean = receiver_str.strip('()').strip() - - # Pattern: variableName *TypeName or variableName TypeName - # Also handle: *TypeName (no variable name) - match = _RE_RECEIVER_TYPE.match(receiver_clean) - if match: - type_name = match.group(1) - if normalize_generics: - return normalize_generic_name(type_name) - return type_name - - # Fallback: split by spaces and take last part - parts = receiver_clean.split() - if len(parts) >= 2: - # Has variable name, last part is type - type_name = parts[-1] - if normalize_generics: - return normalize_generic_name(type_name) - return type_name - elif len(parts) == 1: - # Single word - could be type name or pointer - single_word = parts[0] - # Remove leading * if present - if single_word.startswith('*'): - type_name = single_word[1:] - else: - type_name = single_word - if normalize_generics: - return normalize_generic_name(type_name) - return type_name - - # If single word and starts with capital, it's likely already a type name - if receiver_clean and receiver_clean[0].isupper(): - # Remove leading * if present - if receiver_clean.startswith('*'): - type_name = receiver_clean[1:] - else: - type_name = receiver_clean - if normalize_generics: - return normalize_generic_name(type_name) - return type_name - - return receiver_clean - - -def normalize_generic_name(name: str) -> str: - """ - Normalize a generic type name by removing generic parameters. - - Args: - name: Type name that may include generics (e.g., "Option[T]", "BufferPool[T any]") - - Returns: - Base type name without generics (e.g., "Option", "BufferPool") - - Examples: - - "Option[T]" -> "Option" - - "BufferPool[T]" -> "BufferPool" - - "ConfigBuilder[T]" -> "ConfigBuilder" - - "Option" -> "Option" (no change if no generics) - - "Container[Option[T]]" -> "Container" (handles nested generics) - - "Type[]" -> "Type" (handles empty brackets) - """ - # Remove generic parameters like [T], [T any], [T, U], [], etc. - # Handle nested brackets by repeatedly removing the rightmost bracket pair - # This ensures we remove innermost brackets first - result = name - while True: - # Find the rightmost [ that has a matching ] - # We'll work backwards to find balanced brackets - last_open = result.rfind('[') - if last_open == -1: - break # No more brackets - - # Find the matching closing bracket - bracket_count = 0 - found_close = False - for i in range(last_open, len(result)): - if result[i] == '[': - bracket_count += 1 - elif result[i] == ']': - bracket_count -= 1 - if bracket_count == 0: - # Found matching bracket, remove this bracket pair - result = result[:last_open] + result[i + 1:] - found_close = True - break - - if not found_close: - # Unmatched bracket, just remove the [ - result = result[:last_open] + result[last_open + 1:] - - return result - - -def _normalize_go_signature_preprocessing( - sig_str: str, use_whitespace_normalize: bool = False -) -> str: - """Common preprocessing for signature normalization. - - Args: - sig_str: Go signature string - use_whitespace_normalize: If True, use _RE_WHITESPACE_NORMALIZE, - else _RE_WHITESPACE - - Returns: - Preprocessed signature string - """ - # Remove comments - sig_str = remove_go_comments(sig_str, multiline=True) - - # Normalize whitespace - if use_whitespace_normalize: - sig_str = _RE_WHITESPACE_NORMALIZE.sub(' ', sig_str) - else: - sig_str = _RE_WHITESPACE.sub(' ', sig_str) - sig_str = sig_str.strip() - - # Remove generic type parameters - sig_str = _RE_GENERIC_PARAMS.sub('', sig_str) - - return sig_str - - -def _normalize_package_names_general(sig_str: str) -> str: - """Normalize package-qualified type names to short names (general approach). - - Pattern: package.Type -> Type - Only normalizes internal NovusPack packages, not standard library packages. - """ - standard_lib_packages = { - 'context', 'errors', 'fmt', 'io', 'os', 'strings', 'bytes', 'time', - 'sync', 'reflect', 'encoding', 'encoding/json', 'encoding/binary', - 'crypto', 'net', 'path', 'path/filepath', 'syscall', 'unicode', - 'math', 'sort', 'strconv', 'bufio', 'compress', 'archive', 'hash' - } - - def replace_package_type(match): - full_match = match.group(0) - package_part = match.group(1) - type_name = match.group(2) - - # Check if this is a standard library package - base_package = ( - package_part.split('.')[0] if '.' in package_part else package_part - ) - if base_package in standard_lib_packages: - return full_match # Keep standard library types as-is - - # For internal packages, return just the type name - return type_name - - return _RE_PACKAGE_TYPE.sub(replace_package_type, sig_str) - - -def _normalize_package_names_specific(sig_str: str) -> str: - """Normalize package names using specific regex substitutions. - - For sync validation. Handles re-exported types: generics.X -> X, - metadata.X -> X, etc. - """ - sig_str = _RE_GENERICS_TAG.sub(r'\1', sig_str) - sig_str = _RE_METADATA_TYPES.sub(r'\1', sig_str) - sig_str = _RE_FILEFORMAT_TYPES.sub(r'\1', sig_str) - sig_str = _RE_HEADER_TYPE.sub('PackageHeader', sig_str) - sig_str = _RE_PKGERRORS_TYPES.sub(r'\1', sig_str) - sig_str = _RE_SIGNATURES_TYPES.sub(r'\1', sig_str) - return sig_str - - -def _normalize_returns_simple( - returns: str, normalize_param_list_func -) -> Tuple[str, bool]: - """Normalize return values - simple approach (removes names). - - Returns: - Tuple of (normalized_returns, has_multiple_returns) - """ - normalized_returns = "" - has_multiple_returns = False - if returns: - returns_stripped = returns.strip() - if returns_stripped.startswith('(') and returns_stripped.endswith(')'): - returns_content = returns_stripped[1:-1].strip() - normalized_returns = normalize_param_list_func(returns_content) - has_multiple_returns = True - elif ',' in returns_stripped: - normalized_returns = normalize_param_list_func(returns_stripped) - has_multiple_returns = True - else: - normalized_returns = normalize_param_list_func(returns_stripped) - has_multiple_returns = False - return normalized_returns, has_multiple_returns - - -def _extract_receiver_type_safe(receiver_str: str) -> str: - """Extract receiver type safely, handling both (Type) and (var *Type) formats.""" - receiver_clean = receiver_str.strip('()').strip() - # Check if it's already just a type name (single word, starts with capital) - if (len(receiver_clean.split()) == 1 and receiver_clean and - receiver_clean[0].isupper()): - return receiver_clean - - # Has variable name, extract type - receiver_type = extract_receiver_type(receiver_str, normalize_generics=False) - # Fallback: if extraction failed, try to get last word - if not receiver_type or receiver_type == receiver_str.strip('()'): - parts = receiver_clean.split() - if len(parts) >= 2: - receiver_type = parts[-1] # Last part is the type - else: - receiver_type = receiver_clean - return receiver_type - - -def _format_normalized_signature( - name: str, - normalized_params: str, - normalized_returns: str, - receiver_type: Optional[str] = None, - has_multiple_returns: bool = False, - always_paren_returns: bool = False -) -> str: - """ - Format a normalized Go signature string from its components. - - Args: - name: Function/method name - normalized_params: Normalized parameter list (types only) - normalized_returns: Normalized return values - receiver_type: Receiver type (if method, None for function) - has_multiple_returns: Whether there are multiple return values - (for simple normalization) - always_paren_returns: If True, always use parentheses for returns - (for param-preserving normalization) - - Returns: - Formatted signature string - """ - # Format return values - if normalized_returns: - # Determine if we should use parentheses - use_parens = always_paren_returns or has_multiple_returns - if use_parens: - returns_str = f"({normalized_returns})" - else: - returns_str = normalized_returns - else: - returns_str = "" - - # Format the signature - if receiver_type: - # Method with receiver - if returns_str: - return f"func ({receiver_type}) {name}({normalized_params}) {returns_str}" - else: - return f"func ({receiver_type}) {name}({normalized_params})" - else: - # Function without receiver - if returns_str: - return f"func {name}({normalized_params}) {returns_str}" - else: - return f"func {name}({normalized_params})" - - -def normalize_go_signature(sig_str: str) -> str: - """ - Normalize a Go signature string for comparison. - - Removes comments, normalizes whitespace, standardizes package names, - and extracts receiver types properly. - - Args: - sig_str: Go signature string - - Returns: - Normalized signature string - """ - # Common preprocessing - sig_str = _normalize_go_signature_preprocessing(sig_str, use_whitespace_normalize=False) - - # Normalize package names (general approach) - sig_str = _normalize_package_names_general(sig_str) - - # Normalize parameter list (remove parameter names, keep types) - def normalize_param_list(param_str: str) -> str: - if not param_str.strip(): - return "" - # Simple normalization: remove parameter names - # Pattern: name Type -> Type - params = [] - for param in param_str.split(','): - param = param.strip() - parts = param.split() - if len(parts) >= 2: - # Has name and type: keep type part - params.append(' '.join(parts[1:])) - else: - params.append(param) - return ", ".join(params) - - # Extract and normalize function signatures - # Handle method with receiver - receiver can be in format (Type) or (var *Type) - method_match = _RE_METHOD_NORMALIZE.match(sig_str) - if method_match: - receiver_str = method_match.group(1) - name = method_match.group(2) - params = method_match.group(3) - returns = method_match.group(4).strip() - - # Extract receiver type - receiver_type = _extract_receiver_type_safe(receiver_str) - normalized_params = normalize_param_list(params) - - # Normalize return values - normalized_returns, has_multiple_returns = _normalize_returns_simple( - returns, normalize_param_list - ) - - # Format signature using shared helper - return _format_normalized_signature( - name=name, - normalized_params=normalized_params, - normalized_returns=normalized_returns, - receiver_type=receiver_type, - has_multiple_returns=has_multiple_returns, - always_paren_returns=False - ) - - # Check for function without receiver - func_match = _RE_FUNC_NORMALIZE.match(sig_str) - if func_match: - name = func_match.group(1) - params = func_match.group(2) - returns = func_match.group(3).strip() - - normalized_params = normalize_param_list(params) - - # Normalize return values - normalized_returns, has_multiple_returns = _normalize_returns_simple( - returns, normalize_param_list - ) - - # Format signature using shared helper - return _format_normalized_signature( - name=name, - normalized_params=normalized_params, - normalized_returns=normalized_returns, - receiver_type=None, - has_multiple_returns=has_multiple_returns, - always_paren_returns=False - ) - - return sig_str - - -def normalize_go_signature_with_params(sig_str: str) -> str: - """ - Normalize a Go signature string for comparison while preserving parameter names. - - This is a specialized version for sync validation that handles shorthand - notation and keeps parameter names for exact matching. Use this when you need - to compare signatures where parameter names must match exactly. - - The general-purpose `normalize_go_signature()` removes parameter names and - is better suited for general signature normalization. - - Normalizes: - - Extra whitespace - - Comments - - Generic type parameters (for comparison purposes) - - Package name differences (generics.X vs X) - - Keeps: - - Parameter names (must match exactly) - - Return value names (must match exactly) - """ - # Common preprocessing - sig_str = _normalize_go_signature_preprocessing(sig_str, use_whitespace_normalize=True) - - # Normalize package names (specific approach for sync validation) - sig_str = _normalize_package_names_specific(sig_str) - - # Remove parameter names, keep only types - # Pattern: name Type -> Type - # Handle: ctx context.Context, path string -> context.Context, string - # Handle: offset, size int64 -> int64, int64 - - def _is_parameter_name_only(param: str) -> bool: - """Check if parameter looks like just a name (no type indicators).""" - return not any(c in param for c in [' ', '.', '*', '[', ']', '(', ')']) - - def _can_split_normalized_param(normalized: str) -> bool: - """Check if normalized parameter can be safely split by comma.""" - return (',' in normalized and - not any(c in normalized for c in ['*[', '[]', 'map['])) - - def _process_param_token(param: str, normalize_single_param_func) -> List[Tuple[str, str]]: - """Process a single parameter token, returning list of (tag, value) tuples.""" - if not param: - return [] - - if _is_parameter_name_only(param): - # Just a name, might be part of shorthand - keep it for later processing - return [('name', param)] - - normalized = normalize_single_param_func(param) - if _can_split_normalized_param(normalized): - return [('type', p.strip()) for p in normalized.split(',')] - else: - return [('type', normalized)] - - def _process_last_param_with_shorthand( - param: str, - params: List[Tuple[str, str]], - normalize_single_param_func - ) -> None: - """Process the last parameter, handling shorthand notation.""" - if not param: - return - - # Check if previous params ended with names (shorthand pattern) - if params and params[-1][0] == 'name': - # This is the type for the shorthand names - type_part = normalize_single_param_func(param) - # Replace all trailing 'name' entries with this type - i = len(params) - 1 - while i >= 0 and params[i][0] == 'name': - params[i] = ('type', type_part) - i -= 1 - else: - tokens = _process_param_token(param, normalize_single_param_func) - params.extend(tokens) - - def _resolve_remaining_names( - params: List[Tuple[str, str]] - ) -> List[str]: - """Resolve any remaining name entries to types, handling edge cases.""" - final_params = [] - i = 0 - while i < len(params): - if params[i][0] == 'name': - # Collect consecutive names - names = [params[i][1]] - i += 1 - while i < len(params) and params[i][0] == 'name': - names.append(params[i][1]) - i += 1 - # If next is a type, use it; otherwise these are invalid - if i < len(params) and params[i][0] == 'type': - type_part = params[i][1] - final_params.extend([type_part] * len(names)) - i += 1 - else: - # Invalid - just use the names as-is (shouldn't happen) - final_params.extend(names) - else: - final_params.append(params[i][1]) - i += 1 - return final_params - - def normalize_param_list(param_str: str) -> str: - if not param_str.strip(): - return "" - # Split parameters by comma, but be careful with nested structures - params = [] - current = "" - paren_depth = 0 - bracket_depth = 0 - - for char in param_str: - if char == '(': - paren_depth += 1 - elif char == ')': - paren_depth -= 1 - elif char == '[': - bracket_depth += 1 - elif char == ']': - bracket_depth -= 1 - elif char == ',' and paren_depth == 0 and bracket_depth == 0: - # Found a top-level comma separator - param = current.strip() - if param: - tokens = _process_param_token(param, normalize_single_param) - params.extend(tokens) - current = "" - continue - current += char - - # Process last param - if current.strip(): - _process_last_param_with_shorthand( - current.strip(), params, normalize_single_param - ) - - # Handle any remaining name entries (shouldn't happen in valid Go, but handle gracefully) - final_params = _resolve_remaining_names(params) - return ", ".join(final_params) - - def _is_type_like(param: str) -> bool: - """Check if parameter looks like a type (starts with type indicators).""" - return param and (param.startswith('*') or param.startswith('[') or param[0].isupper()) - - def _extract_type_from_shorthand(parts: List[str]) -> str: - """Extract type from shorthand notation (e.g., 'offset, size int64').""" - type_part = parts[-1] - first_part = parts[0] - name_list = [n.strip() for n in first_part.split(',')] - # Return type repeated for each name - return ", ".join([type_part] * len(name_list)) - - def _extract_type_from_regular(parts: List[str]) -> str: - """Extract type from regular notation (e.g., 'name Type' or 'name *package.Type').""" - if len(parts) == 2: - return parts[-1] - else: - # Multiple words: might be name *package.Type - # Remove first word (the name) - return " ".join(parts[1:]) - - def normalize_single_param(param: str) -> str: - """Normalize a single parameter, handling shorthand notation.""" - # Remove leading parameter names - # Pattern: name Type or name1, name2 Type - # Handle: offset, size int64 -> int64 (expand to int64, int64) - - parts = param.split() - if len(parts) < 2: - # Single identifier - might be just a type or just a name - if _is_type_like(param): - return param - # Otherwise, it's probably just a name - return as-is (caller will handle) - return param - - # Check if first part has commas (shorthand) - first_part = parts[0] - if ',' in first_part: - # Shorthand: offset, size int64 - return _extract_type_from_shorthand(parts) - else: - # Regular: name Type - remove the name, keep the type - return _extract_type_from_regular(parts) - - # Extract and normalize function signatures - # Pattern: func Name(params) returns or func (r Receiver) Name(params) returns - func_match = _RE_FUNC_WITH_PARAMS.match(sig_str) - if func_match: - name = func_match.group(1) - params = func_match.group(2) - returns = func_match.group(3).strip() - - normalized_params = normalize_param_list(params) - # For returns, keep names and types - they must match exactly - # Expand shorthand in returns too - normalized_returns = normalize_param_list(returns) if returns else "" - - # Reconstruct using shared helper - receiver_match = _RE_RECEIVER_MATCH.match(sig_str) - if receiver_match: - receiver = receiver_match.group(1) - receiver_type = extract_receiver_type(receiver) - return _format_normalized_signature( - name=name, - normalized_params=normalized_params, - normalized_returns=normalized_returns, - receiver_type=receiver_type, - has_multiple_returns=False, # Not used when always_paren_returns=True - always_paren_returns=True - ) - else: - # For functions without receiver, always use parentheses for returns - # (this matches the behavior expected by sync validation) - return _format_normalized_signature( - name=name, - normalized_params=normalized_params, - normalized_returns=normalized_returns, - receiver_type=None, - has_multiple_returns=False, # Not used when always_paren_returns=True - always_paren_returns=True - ) - - return sig_str - - -@dataclass(frozen=True) -class Signature: - """ - Represents a Go function, method, or type signature. - - This is a shared dataclass used across multiple validation scripts. - Optional fields allow different scripts to track additional information - as needed. - """ - name: str - kind: str # 'func', 'method', 'type', 'interface' - receiver: Optional[str] = None # For methods: the receiver type - params: str = "" # Parameter list as string - returns: str = "" # Return types as string - location: str = "" # File path and line number - is_public: bool = True # Whether it's exported (starts with capital) - # Optional fields for scripts that need more detail - has_body: bool = False # Whether this is a full definition with body - method_count: int = 0 # For interfaces: number of methods in body - field_count: int = 0 # For structs: number of fields in body - generic_params: Optional[str] = None # Generic parameters like "[T any]" or None - - def normalized_key(self) -> str: - """Generate a normalized key for comparison.""" - if self.kind == 'method' and self.receiver: - return f"{self.receiver}.{self.name}" - elif self.kind in ('type', 'interface') and self.generic_params: - # Include generics in key to distinguish SigningKey from SigningKey[T] - return f"{self.name}{self.generic_params}" - return self.name - - def normalized_signature(self) -> str: - """Generate a normalized signature string for comparison.""" - # Normalize whitespace and remove comments - params = _RE_WHITESPACE_NORMALIZE.sub(' ', self.params.strip()) - returns = _RE_WHITESPACE_NORMALIZE.sub(' ', self.returns.strip()) - - if self.kind == 'method': - return f"func ({self.receiver}) {self.name}({params}) ({returns})" - elif self.kind == 'func': - return f"func {self.name}({params}) ({returns})" - elif self.kind == 'type': - return f"type {self.name}" - elif self.kind == 'interface': - return f"type {self.name} interface" - return f"{self.kind} {self.name}" - - def normalized_type_name(self) -> str: - """ - Get normalized type name (without generics for display purposes). - - For types with generics, returns just the base name. - For other types, returns the name as-is. - """ - if self.generic_params: - return normalize_generic_name(self.name) - return self.name - - def is_method(self) -> bool: - """Check if this is a method (has receiver).""" - return self.kind == 'method' and self.receiver is not None - - -def is_public_name(name: str) -> bool: - """ - Check if a name is public (exported) in Go. - - In Go, exported identifiers start with an uppercase letter. - - Args: - name: The name to check - - Returns: - True if the name is public (starts with uppercase letter) - """ - return bool(name and name[0].isupper()) - - -class InterfaceParser: - """ - Helper class for parsing Go interfaces with brace depth tracking. - - This handles the common pattern of tracking interface definitions - and their methods across multiple scripts. - """ - - def __init__(self): - self.in_interface = False - self.current_interface: Optional[str] = None - self.brace_depth = 0 - - def reset(self): - """Reset the parser state.""" - self.in_interface = False - self.current_interface = None - self.brace_depth = 0 - - def check_interface_start(self, line: str) -> Optional[str]: - """ - Check if a line starts an interface definition. - - Args: - line: The line to check - - Returns: - Interface name if this line starts an interface, None otherwise - """ - # Pattern: type Name interface { or type Name[T] interface { - interface_match = re.match( - r'^\s*type\s+(\w+)(?:\s*(\[[^\]]+\]))?\s+interface\s*\{', line - ) - if interface_match: - self.in_interface = True - self.current_interface = interface_match.group(1) - stripped = line.strip() - self.brace_depth = stripped.count('{') - stripped.count('}') - return self.current_interface - return None - - def update_brace_depth(self, line: str) -> bool: - """ - Update brace depth for current interface. - - Args: - line: The current line - - Returns: - True if still inside interface, False if interface closed - """ - if not self.in_interface: - return False - - stripped = line.strip() - self.brace_depth += stripped.count('{') - stripped.count('}') - - if self.brace_depth <= 0: - self.in_interface = False - self.current_interface = None - return False - - return True - - def is_in_interface(self) -> bool: - """Check if currently parsing an interface.""" - return self.in_interface - - def get_current_interface(self) -> Optional[str]: - """Get the name of the current interface being parsed.""" - return self.current_interface - - -def extract_interfaces_from_go_file( - file_path: Path, - parse_methods: bool = True -) -> List[Signature]: - """ - Extract all interfaces (and optionally their methods) from a Go source file. - - Args: - file_path: Path to the Go source file - parse_methods: If True, also extract interface methods as separate signatures - - Returns: - List of Signature objects for interfaces (and their methods if parse_methods=True) - """ - interfaces = [] - methods = [] - - try: - resolved_path = file_path.resolve() - content = file_path.read_text(encoding='utf-8') - lines = content.split('\n') - - interface_parser = InterfaceParser() - - for line_num, line in enumerate(lines, 1): - stripped = line.strip() - - # Skip empty lines and comments - if not stripped or stripped.startswith('//'): - continue - - # Check for interface start using InterfaceParser - interface_name = interface_parser.check_interface_start(line) - if interface_name: - is_public = is_public_name(interface_name) if interface_name else False - interfaces.append(Signature( - name=interface_name, - kind='interface', - location=f"{resolved_path}:{line_num}", - is_public=is_public - )) - continue - - # Track interface brace depth using InterfaceParser - if interface_parser.is_in_interface(): - # Check brace depth before updating to catch methods on closing line - brace_depth_before = interface_parser.brace_depth - current_interface = interface_parser.get_current_interface() - still_in_interface = interface_parser.update_brace_depth(line) - - # Check for interface method if we're still in interface or on closing line - if parse_methods and ( - (still_in_interface and interface_parser.brace_depth > 0) or - (brace_depth_before > 0 and not still_in_interface and '{' not in stripped) - ): - sig = parse_go_def_signature(line, location=f"{resolved_path}:{line_num}") - if sig and sig.kind in ('func', 'method'): - # Interface methods don't have receivers in the interface definition - # but we track them as methods of the interface type - methods.append(Signature( - name=sig.name, - kind='method', - receiver=current_interface, - params=sig.params, - returns=sig.returns, - location=f"{resolved_path}:{line_num}", - is_public=sig.is_public - )) - - if not still_in_interface: - # Interface closed, parser already reset - pass - continue - - except Exception as e: - print(f"Warning: Error reading {file_path}: {e}") - - # Return interfaces first, then methods - return interfaces + methods - - -def extract_interfaces_from_markdown( - content: str, - file_path: Path, - start_line: int = 1, - parse_methods: bool = True, - skip_examples: bool = True, - lines: Optional[List[str]] = None -) -> List[Signature]: - """ - Extract all interfaces (and optionally their methods) from Go code blocks in markdown. - - Args: - content: Markdown content as string - file_path: Path to the markdown file (for location strings) - start_line: Starting line number for the content (default: 1) - parse_methods: If True, also extract interface methods as separate signatures - skip_examples: If True, skip interfaces that are marked as examples - lines: Optional list of all lines in the file (for example detection) - - Returns: - List of Signature objects for interfaces (and their methods if parse_methods=True) - """ - interfaces = [] - methods = [] - - try: - resolved_path = file_path.resolve() - go_blocks = find_go_code_blocks(content) - - for block_start_line, block_end_line, code_content in go_blocks: - # Process each code block - block_lines = code_content.split('\n') - interface_parser = InterfaceParser() - - for i, line in enumerate(block_lines): - # Calculate actual line number in file (1-indexed) - line_num = block_start_line + i - - stripped = line.strip() - - # Skip empty lines and comments - if not stripped or stripped.startswith('//'): - continue - - # Check if this is an example signature - is_example = False - if skip_examples: - is_example = is_example_code( - code_content, block_start_line, - lines=lines, - check_single_line=i - ) - - # Check for interface start using InterfaceParser - interface_name = interface_parser.check_interface_start(line) - if interface_name: - # Skip if this is an example - if is_example: - continue - - # Extract generic parameters from the line - generic_match = re.match( - r'^\s*type\s+\w+\s*(\[[^\]]+\])?\s+interface\s*\{', line - ) - generic_params = generic_match.group(1) if generic_match else None - is_public = is_public_name(interface_name) if interface_name else False - - # Check if this is a stub (interface without body or minimal body) - has_full_body = interface_parser.brace_depth > 0 - - # Count methods if it's a full interface - method_count = 0 - if has_full_body and parse_methods: - # Look ahead to count methods within this code block - temp_brace_depth = interface_parser.brace_depth - for j in range(i + 1, len(block_lines)): - temp_line = block_lines[j] - temp_stripped = temp_line.strip() - if not temp_stripped or temp_stripped.startswith('//'): - continue - temp_brace_depth += temp_stripped.count('{') - temp_stripped.count('}') - temp_sig = parse_go_def_signature(temp_line, location="") - if (temp_sig and temp_sig.kind in ('func', 'method') and - temp_brace_depth > 0): - method_count += 1 - if temp_brace_depth <= 0: - break - - interfaces.append(Signature( - name=interface_name, - kind='interface', - location=f"{resolved_path}:{line_num}", - is_public=is_public, - has_body=has_full_body, - method_count=method_count, - generic_params=generic_params - )) - continue - - # Track interface brace depth using InterfaceParser - if interface_parser.is_in_interface(): - # Check brace depth before updating to catch methods on closing line - brace_depth_before = interface_parser.brace_depth - current_interface = interface_parser.get_current_interface() - still_in_interface = interface_parser.update_brace_depth(line) - - # Check for interface method if we're still in interface or on closing line - if parse_methods and ( - (still_in_interface and interface_parser.brace_depth > 0) or - (brace_depth_before > 0 and not still_in_interface and '{' not in stripped) - ): - sig = parse_go_def_signature(line, location=f"{resolved_path}:{line_num}") - if sig and sig.kind in ('func', 'method'): - # Interface methods don't have receivers in the interface definition - # but we track them as methods of the interface type - methods.append(Signature( - name=sig.name, - kind='method', - receiver=current_interface, - params=sig.params, - returns=sig.returns, - location=f"{resolved_path}:{line_num}", - is_public=sig.is_public, - has_body=False # Methods in interface are stubs - )) - - if not still_in_interface: - # Interface closed, parser already reset - pass - continue - - except Exception as e: - print(f"Warning: Error processing interfaces from {file_path}: {e}") - - # Return interfaces first, then methods - return interfaces + methods - - -def find_definition_line_index(code: str, is_type: bool) -> Optional[int]: - """ - Find the line index (0-indexed) of the first definition in code. - - Args: - code: Go code block content - is_type: True to find type definition, False to find function definition - - Returns: - Line index (0-indexed) of the definition, or None if not found - """ - lines = code.split('\n') - for i, line in enumerate(lines): - stripped = line.strip() - # Skip comments and empty lines - if not stripped or stripped.startswith('//'): - continue - - sig = parse_go_def_signature(line) - if sig: - if is_type: - # Check for type definition - if sig.kind not in ('func', 'method'): - return i - else: - # Check for function/method definition - if sig.kind in ('func', 'method'): - return i - - return None - - -def is_example_definition( - code: str, - start_line: int, - lines: List[str], - heading: Optional[str], - is_type: bool -) -> bool: - """ - Check if a definition (type or function) is example code. - - Uses the existing find_definition_line_index to find the signature, - then checks if it's example code using the unified is_example_code utility. - - Args: - code: Go code block content - start_line: Line number where code block starts (1-indexed) - lines: All lines of the file (for context checking) - heading: Optional heading text (for example detection) - is_type: True for type definitions, False for function/method definitions - - Returns: - True if the definition is example code, False otherwise - """ - def_line_idx = find_definition_line_index(code, is_type=is_type) - if def_line_idx is None: - return False - - return is_example_code( - code, start_line, - lines=lines, - heading_text=heading, - check_prose_before_block=True, - check_single_line=def_line_idx # 0-indexed line number within code block - ) - - -def count_go_definitions( - code: str, - filter_example: bool = False, - lines: Optional[List[str]] = None, - start_line: int = 1, - heading_text: Optional[str] = None -) -> Dict[str, int]: - """ - Count Go definitions in code using the unified parser. - - Args: - code: Go code block content - filter_example: If True, skip example code when counting - lines: All lines of the file (required if filter_example=True) - start_line: Line number where code block starts (required if filter_example=True) - heading_text: Optional heading text (used for example detection if filter_example=True) - - Returns: - Dictionary with counts: - - 'func': count of functions - - 'method': count of methods - - 'type': count of all types (struct, interface, alias, etc., excluding function types) - - 'func_type': count of function types (type Name func(...)) - """ - counts: Dict[str, int] = { - 'func': 0, - 'method': 0, - 'type': 0, - 'func_type': 0 - } - - code_lines = code.split('\n') - for line_idx, line in enumerate(code_lines): - # Filter example code if requested - if filter_example and lines is not None: - if is_example_code( - code, start_line, - lines=lines, - heading_text=heading_text, - check_prose_before_block=True, - check_single_line=line_idx - ): - continue - - # Try to parse as definition - sig = parse_go_def_signature(line) - if sig: - if sig.kind == 'func': - counts['func'] += 1 - elif sig.kind == 'method': - counts['method'] += 1 - else: - # All other kinds are types (struct, interface, alias, etc.) - counts['type'] += 1 - else: - # Check for function types (parse_go_def_signature excludes these) - stripped = line.strip() - if stripped and not stripped.startswith('//'): - if _RE_FUNC_TYPE_DEF.match(line): - counts['func_type'] += 1 - - return counts - - -def is_definition_start_line(line: str) -> bool: - """ - Return True if the line starts a type, func, method, or func type definition. - - Used to detect signature-only code blocks: only definition-start lines - and continuation lines (braces, indented body) should be present. - - Args: - line: Single line of Go code (may have leading/trailing whitespace). - - Returns: - True if the line is a definition start (type, func, method, func type). - """ - stripped = line.strip() - if not stripped: - return False - if parse_go_def_signature(stripped): - return True - if _RE_FUNC_TYPE_DEF.match(stripped): - return True - return False - - -def is_continuation_line(line: str) -> bool: - """ - Return True if the line is brace-only or indented (part of a definition body). - - Used together with is_definition_start_line to classify lines in a code block - when detecting signature-only blocks (struct/interface bodies, func bodies). - - Args: - line: Single line of Go code (may have leading/trailing whitespace). - - Returns: - True if the line is only braces or starts with whitespace (continuation). - """ - stripped = line.strip() - if stripped in ('{', '}'): - return True - if len(line) > 0 and line[0].isspace(): - return True - return False - - -def is_signature_only_code_block(code: str) -> bool: - """ - Return True if the Go code block contains only definition signatures and bodies. - - After removing comments, every non-empty line must be either a definition-start - line (type/func/method/func type) or a continuation line (brace-only or - indented body). Comment lines are removed and not counted. - - Used by documentation audits to skip requirement coverage for sections that - only list API signatures (e.g. struct definitions with fields, method stubs). - - Args: - code: Go code block content (no markdown fences). - - Returns: - True if the block has at least one definition and no other substantive lines. - """ - cleaned = remove_go_comments(code, multiline=True) - non_empty_lines = [line for line in cleaned.split('\n') if line.strip()] - if not non_empty_lines: - return False - definition_start_count = 0 - for line in non_empty_lines: - if is_definition_start_line(line): - definition_start_count += 1 - elif not is_continuation_line(line): - return False - return definition_start_count >= 1 - - -def find_first_definition(code: str, is_type: bool) -> Optional[Signature]: - """ - Find and parse the first definition in code. - - This is a convenience function that combines finding the definition line - and parsing it into a Signature object. - - Args: - code: Go code block content - is_type: True for type definitions, False for function/method definitions - - Returns: - Signature object or None if no definition found - """ - def_line_idx = find_definition_line_index(code, is_type=is_type) - if def_line_idx is None: - return None - code_lines = code.split('\n') - return parse_go_def_signature(code_lines[def_line_idx]) - - -def check_kind_word_after( - heading: str, - search_term: str, - kind_word: str, - display_name: str, - error_prefix: str, - errors: List[str] -) -> None: - """ - Check if kind word appears immediately after the search term in heading. - - Args: - heading: The heading text (will be converted to lowercase internally) - search_term: The search term to look for (e.g., "Package" or "FileEntry.GetState") - kind_word: The expected kind word (e.g., "Method", "Function", "Struct") - display_name: The name to display in error messages - error_prefix: Prefix for error messages (e.g., "Method heading") - errors: List to append error messages to - - Returns: - None (modifies errors list in place) - """ - heading_lower = heading.lower() - search_term_lower = search_term.lower() - kind_word_lower = kind_word.lower() - - if search_term_lower in heading_lower: - # Find position of search term in heading (case-insensitive) - search_pos = heading_lower.find(search_term_lower) - if search_pos != -1: - # Check if kind word appears immediately after (allowing for whitespace) - after_search = heading_lower[search_pos + len(search_term_lower):].strip() - if not after_search.startswith(kind_word_lower): - errors.append( - f'{error_prefix} should include "{kind_word}" immediately after ' - f'{display_name}' - ) - else: - # If search term is present but we couldn't find it case-insensitively, - # check if kind word is anywhere in heading - if kind_word_lower not in heading_lower: - errors.append( - f'{error_prefix} should include "{kind_word}" immediately after ' - f'{display_name}' - ) +__all__ = [ + "EXAMPLE_MARKERS", + "EXAMPLE_NAME_PREFIXES", + "InterfaceParser", + "Signature", + "check_kind_word_after", + "count_go_definitions", + "determine_type_kind", + "extract_go_doc_comment_above", + "extract_interfaces_from_go_file", + "extract_interfaces_from_markdown", + "extract_receiver_type", + "find_definition_line_index", + "find_first_definition", + "find_go_code_blocks", + "is_continuation_line", + "is_definition_start_line", + "is_example_code", + "is_example_definition", + "is_example_signature_name", + "is_in_go_code_block", + "is_public_name", + "is_signature_only_code_block", + "normalize_generic_name", + "normalize_go_signature", + "normalize_go_signature_with_params", + "parse_go_def_signature", + "remove_go_comments", +] diff --git a/scripts/lib/_go_code_utils_test.py b/scripts/lib/_go_code_utils_test.py index b48652f4..81f0b99a 100644 --- a/scripts/lib/_go_code_utils_test.py +++ b/scripts/lib/_go_code_utils_test.py @@ -1,21 +1,27 @@ #!/usr/bin/env python3 """Test script for _go_code_utils normalization logic.""" -import sys -from pathlib import Path - -# Add scripts directory to path -sys.path.insert(0, str(Path(__file__).parent.parent)) - -from lib._go_code_utils import ( # noqa: E402 - normalize_go_signature, normalize_generic_name, extract_receiver_type, - parse_go_def_signature, - find_go_code_blocks, is_in_go_code_block, is_example_code, +from lib._go_code_utils import ( + extract_receiver_type, + find_go_code_blocks, + is_continuation_line, + is_definition_start_line, + is_example_code, is_example_signature_name, - is_definition_start_line, is_continuation_line, is_signature_only_code_block + is_in_go_code_block, + is_signature_only_code_block, + normalize_generic_name, + normalize_go_signature, + parse_go_def_signature, ) +def _check(condition: bool, message: str = "check failed") -> None: + """Raise AssertionError if condition is false (avoids assert for Bandit B101).""" + if not condition: + raise AssertionError(message) + + def test_normalize_generic_name(): """Test generic name normalization.""" print("Testing normalize_generic_name:") @@ -491,11 +497,11 @@ def test_is_signature_only_code_block(): status = "✓" if not result else "✗" print(f" {status} type + var -> {result} (expected False)") # Definition start line detection - assert is_definition_start_line("type SymlinkConvertOptions struct {") - assert is_definition_start_line("func (p *Package) Foo() error") - assert not is_definition_start_line(" PrimaryPath Option[string]") - assert is_continuation_line(" PrimaryPath Option[string]") - assert is_continuation_line("}") + _check(is_definition_start_line("type SymlinkConvertOptions struct {")) + _check(is_definition_start_line("func (p *Package) Foo() error")) + _check(not is_definition_start_line(" PrimaryPath Option[string]")) + _check(is_continuation_line(" PrimaryPath Option[string]")) + _check(is_continuation_line("}")) print(" ✓ is_definition_start_line / is_continuation_line edge cases") print() diff --git a/scripts/lib/_index_utils.py b/scripts/lib/_index_utils.py index e0d1ee5e..cd218631 100644 --- a/scripts/lib/_index_utils.py +++ b/scripts/lib/_index_utils.py @@ -4,8 +4,9 @@ from dataclasses import dataclass, field from typing import Dict, List, Literal, Optional, Set, TYPE_CHECKING -from lib._go_code_utils import normalize_generic_name -from lib._validation_utils import ProseSection, generate_anchor_from_heading +from lib._validation_utils import ProseSection +from lib import _index_utils_parsing +from lib import _index_utils_rendering if TYPE_CHECKING: pass @@ -13,11 +14,6 @@ SectionKind = Literal["type", "method", "func"] _SECTION_NUMBER_RE = re.compile(r"^\d+(?:\.\d+)*$") -_INDEX_HEADING_RE = re.compile(r"^(#{1,6})\s+(.+)$") -_INDEX_SECTION_NUMBER_RE = re.compile(r"^(\d+(?:\.\d+)*)(?:\.)?\s+(.+)$") -_INDEX_ENTRY_RE = re.compile(r"^\s*-\s+\*\*`([^`]+)`\*\*") -_INDEX_LINK_RE = re.compile(r"\[([^\]]+)\]\(([^)#]+)(?:#([^)]+))?\)") -_TITLE_HEADING_RE = re.compile(r"^#\s+(.+)$") def _split_words_lower(text: str) -> list[str]: @@ -26,6 +22,13 @@ def _split_words_lower(text: str) -> list[str]: return [w.lower() for w in re.split(r"[^A-Za-z0-9]+", text) if w] +def _entry_sort_key(name: str) -> tuple[int, str, str]: + if not name: + return (1, "", "") + first = name[0] + return (0 if first.isupper() else 1, name.lower(), name) + + @dataclass(slots=True) class IndexEntry: """An entry in the index file.""" @@ -59,10 +62,11 @@ def link_target(self) -> str: return f"{self.link_file}#{self.link_anchor}" return self.link_file - def sort_key(self) -> str: + def sort_key(self) -> tuple[int, str, str]: if "." in self.name: - return self.name.split(".", 1)[1].lower() - return self.name.lower() + method_name = self.name.split(".", 1)[1] + return _entry_sort_key(method_name) + return _entry_sort_key(self.name) @dataclass(slots=True) @@ -146,14 +150,11 @@ def sort_expected_entries(self) -> None: if not self.expected_entries: return - def sort_key(name: str) -> tuple[str, int, str]: - if not name: - return ("", 1, "") - first = name[0] - return (name.lower(), 0 if first.isupper() else 1, name) - - sorted_names = sorted(self.expected_entries.keys(), key=sort_key) - self.expected_entries = {name: self.expected_entries[name] for name in sorted_names} + sorted_items = sorted( + self.expected_entries.items(), + key=lambda item: item[1].sort_key(), + ) + self.expected_entries = dict(sorted_items) def iter_sections(self) -> list[IndexSection]: ordered = [self] @@ -197,14 +198,14 @@ def derive_heading_kind(heading_text: str) -> SectionKind: def _validate_section_number(value: str) -> None: if not _SECTION_NUMBER_RE.match(value): raise ValueError( - "section_number must be a dotted number like '1', '2.4', or '3.5.6', got: %r" - % (value,) + "section_number must be a dotted number like '1', '2.4', or '3.5.6', got: " + f"{value!r}" ) @staticmethod def _validate_heading_level(value: int) -> None: if value not in (2, 3, 4): - raise ValueError("heading_level must be 2, 3, or 4, got: %r" % (value,)) + raise ValueError(f"heading_level must be 2, 3, or 4, got: {value!r}") @staticmethod def _validate_parent_heading( @@ -221,8 +222,7 @@ def _validate_parent_heading( if parent_heading.heading_level >= heading_level: raise ValueError( "parent_heading.heading_level must be less than heading_level " - "(parent=%r, child=%r)" - % (parent_heading.heading_level, heading_level) + f"(parent={parent_heading.heading_level!r}, child={heading_level!r})" ) @staticmethod @@ -236,393 +236,19 @@ def _normalize_lower_list(values: list[str]) -> list[str]: return out -def _extract_entry_description( - lines: list[str], - entry_line_num: int, -) -> tuple[list[str], Optional[int], Optional[int]]: - description_lines: List[str] = [] - description_start_line = None - description_end_line = None - - i = entry_line_num - while i < len(lines): - next_line = lines[i] - stripped = next_line.rstrip() - - if _INDEX_ENTRY_RE.match(stripped): - break - if re.match(r"^##+\s+", stripped): - break - if stripped and not (stripped.startswith(" ") or stripped.startswith(" ")): - if not stripped.strip(): - i += 1 - continue - break - - if stripped.startswith(" - ") or stripped.startswith(" - "): - if description_start_line is None: - description_start_line = i + 1 - bullet_text = stripped.split("- ", 1)[1].strip() - if bullet_text: - description_lines.append(bullet_text) - description_end_line = i + 1 - elif (stripped.startswith(" ") or stripped.startswith(" ")) and description_lines: - # Continuation of previous bullet line. - description_lines[-1] += " " + stripped.lstrip().strip() - description_end_line = i + 1 - - i += 1 - - return description_lines, description_start_line, description_end_line - - def parse_entry_descriptions( index_content: str, entry_line_numbers: Dict[str, int], ) -> Dict[str, tuple[bool, Optional[str], Optional[int]]]: - """ - Parse descriptive text (indented bullets) for each index entry. + return _index_utils_parsing.parse_entry_descriptions(index_content, entry_line_numbers) - Args: - index_content: Full content of the index file - entry_line_numbers: Dict mapping normalized entry name -> line number - Returns: - Dict mapping normalized entry name -> - (has_description, description_text, description_line_start) - """ - lines = index_content.split("\n") - descriptions: Dict[str, tuple[bool, Optional[str], Optional[int]]] = {} - entry_pattern = r"^\s*-\s+\*\*`([^`]+)`\*\*" - - line_to_entry: Dict[int, str] = { - line_num: name for name, line_num in entry_line_numbers.items() - } - - for line_num, line in enumerate(lines, 1): - if line_num not in line_to_entry: - continue - - entry_name = line_to_entry[line_num] - description_lines = [] - description_start_line = None - - i = line_num - while i < len(lines): - next_line = lines[i] - stripped = next_line.rstrip() - - if re.match(entry_pattern, stripped): - break - if re.match(r"^##+\s+", stripped): - break - if stripped and not (stripped.startswith(" ") or stripped.startswith(" ")): - if not stripped.strip(): - i += 1 - continue - break - - if stripped.startswith(" - ") or stripped.startswith(" - "): - if description_start_line is None: - description_start_line = i - bullet_text = stripped.split("- ", 1)[1].strip() - if bullet_text: - description_lines.append(bullet_text) - elif (stripped.startswith(" ") or stripped.startswith(" ")) and description_lines: - if description_lines: - description_lines[-1] += " " + stripped.lstrip().strip() - - i += 1 - - if description_lines: - description_text = " ".join(description_lines).strip() - has_description = len(description_text) >= 20 - descriptions[entry_name] = ( - has_description, - description_text if has_description else None, - description_start_line, - ) - else: - descriptions[entry_name] = (False, None, None) - - return descriptions - - -def _parse_overview_section( - lines: list[str], - end_line: int, -) -> Optional[ProseSection]: - headings: List[tuple[int, str, int]] = [] - for line_num, line in enumerate(lines[:end_line], 1): - match = _INDEX_HEADING_RE.match(line) - if not match: - continue - level = len(match.group(1)) - heading_text = match.group(2).strip() - if level < 2: - continue - headings.append((level, heading_text, line_num)) - - if not headings: - return None - - root: Optional[ProseSection] = None - stack: List[ProseSection] = [] - nodes_by_line: Dict[int, ProseSection] = {} - - def _split_heading(heading_text: str) -> tuple[str, Optional[str]]: - num_match = _INDEX_SECTION_NUMBER_RE.match(heading_text) - if not num_match: - return heading_text.strip(), None - return num_match.group(2).strip(), num_match.group(1).strip() - - for level, heading_text, line_num in headings: - heading_str, heading_num = _split_heading(heading_text) - node = ProseSection( - heading_str=heading_str, - heading_num=heading_num, - heading_level=level, - heading_line=line_num, - content="", - ) - while stack and stack[-1].heading_level >= level: - stack.pop() - if stack: - parent = stack[-1] - node.parent_section = parent - parent.child_sections.append(node) - elif root is None: - root = node - else: - node.parent_section = root - root.child_sections.append(node) - stack.append(node) - nodes_by_line[line_num] = node - - for idx, (_level, _heading_text, line_num) in enumerate(headings): - node_line_start = line_num + 1 - node_line_end = end_line if idx + 1 >= len(headings) else headings[idx + 1][2] - 1 - node_lines = lines[node_line_start - 1:node_line_end] - content = "\n".join(node_lines).strip() - node = nodes_by_line.get(line_num) - if node is None: - continue - node.content = content - node.lines = (node_line_start, node_line_end) - - in_code = False - code_start = None - for offset, line in enumerate(node_lines, node_line_start): - stripped = line.strip() - if stripped.startswith("```"): - if not in_code: - in_code = True - code_start = offset - else: - in_code = False - code_type = stripped[3:].strip().split()[0] if stripped[3:].strip() else "" - node.code_blocks.append((code_start or offset, offset, code_type)) - node.has_code = True - - return root - - -def parse_index(index_content: str) -> ParsedIndex: - """ - Parse docs/tech_specs/api_go_defs_index.md into structured sections and entries. - """ - sections: Dict[str, IndexSection] = {} - section_order: List[str] = [] - section_path_lines: Dict[str, List[int]] = {} - - current_h2: Optional[IndexSection] = None - current_h3: Optional[IndexSection] = None - current_h4: Optional[IndexSection] = None - - title = "" - lines = index_content.split("\n") - for line in lines: - title_match = _TITLE_HEADING_RE.match(line) - if title_match: - title = title_match.group(1).strip() - break - - first_def_section_line = len(lines) - for line_num, line in enumerate(lines, 1): - heading_match = _INDEX_HEADING_RE.match(line) - if not heading_match: - continue - hashes = heading_match.group(1) - heading_level = len(hashes) - if heading_level not in (2, 3, 4): - continue - raw_heading = heading_match.group(2).strip() - n = _INDEX_SECTION_NUMBER_RE.match(raw_heading) - if not n: - continue - section_number = n.group(1).strip() - if section_number.split(".")[0] == "0": - continue - first_def_section_line = line_num - break - - overview = _parse_overview_section(lines, first_def_section_line - 1) - if overview: - overview.file_path = "api_go_defs_index.md" - - for line_num, line in enumerate(lines, 1): - heading_match = _INDEX_HEADING_RE.match(line) - if heading_match: - hashes = heading_match.group(1) - heading_level = len(hashes) - if heading_level not in (2, 3, 4): - continue - - raw_heading = heading_match.group(2).strip() - n = _INDEX_SECTION_NUMBER_RE.match(raw_heading) - if not n: - continue - - section_number = n.group(1).strip() - if section_number.split(".")[0] == "0": - continue - heading_text = n.group(2).strip() - - if heading_level == 2: - parent = None - current_h2 = None - current_h3 = None - current_h4 = None - elif heading_level == 3: - parent = current_h2 - current_h3 = None - current_h4 = None - else: - parent = current_h3 - current_h4 = None - - if parent is None and heading_level != 2: - continue - - heading_kind = IndexSection.derive_heading_kind(heading_text) - node = IndexSection( - section_number=section_number, - heading_level=heading_level, - parent_heading=parent, - heading_text=heading_text, - kind=heading_kind, - ) - if parent is not None: - parent.add_child(node) - - section_path = node.path_label() - section_path_lines.setdefault(section_path, []).append(line_num) - if section_path in sections: - continue - - sections[section_path] = node - section_order.append(section_path) - - if heading_level == 2: - current_h2 = node - elif heading_level == 3: - current_h3 = node - else: - current_h4 = node - continue - - entry_match = _INDEX_ENTRY_RE.match(line) - if not entry_match: - continue - - raw_name = entry_match.group(1) - name = normalize_generic_name(raw_name) - - section_node = current_h4 or current_h3 or current_h2 - if section_node is None: - continue - - link_match = _INDEX_LINK_RE.search(line) - link_text = "" - link_file = "" - link_anchor: Optional[str] = None - if link_match: - link_text = link_match.group(1).strip() - link_file = link_match.group(2).strip() - anchor = link_match.group(3) - link_anchor = anchor.strip() if anchor else None - - entry = IndexEntry( - name=name, - raw_name=raw_name, - current_section=section_node.path_label(), - link_text=link_text, - link_file=link_file, - link_anchor=link_anchor, - line_number=line_num, - kind=section_node.kind, - ) - section_node.add_entry(entry) - section_node.current_entries[name] = entry - - duplicates = {path: lines for path, lines in section_path_lines.items() if len(lines) > 1} - if duplicates: - details = ", ".join( - f"{path} (lines {', '.join(str(line_num) for line_num in dup_lines)})" - for path, dup_lines in sorted(duplicates.items()) - ) - raise ValueError(f"Duplicate headings detected in index file: {details}") - - for section_path in section_order: - section = sections[section_path] - for entry in section.entries: - desc_lines, desc_start, desc_end = _extract_entry_description( - lines, entry.line_number - ) - entry.description_lines = desc_lines - entry.description_line_start = desc_start - entry.description_line_end = desc_end - if desc_lines: - desc_text = " ".join(desc_lines).strip() - entry.description_text = desc_text - entry.has_description = len(desc_text) >= 20 - - unsorted_types = IndexSection( - section_number="0", - heading_level=2, - parent_heading=None, - heading_text="Unsorted Types", - kind="type", - ) - unsorted_methods = IndexSection( - section_number="0", - heading_level=2, - parent_heading=None, - heading_text="Unsorted Methods", - kind="method", - ) - unsorted_funcs = IndexSection( - section_number="0", - heading_level=2, - parent_heading=None, - heading_text="Unsorted Functions", - kind="func", - ) - unsorted_paths = [ - unsorted_types.path_label(), - unsorted_methods.path_label(), - unsorted_funcs.path_label(), - ] - sections[unsorted_paths[0]] = unsorted_types - sections[unsorted_paths[1]] = unsorted_methods - sections[unsorted_paths[2]] = unsorted_funcs - - return ParsedIndex( - sections=sections, - section_order=section_order, - overview=overview, - unsorted_paths=unsorted_paths, - title=title, +def parse_index(index_content: str) -> "ParsedIndex": + return _index_utils_parsing.parse_index( + index_content, + index_section_cls=IndexSection, + index_entry_cls=IndexEntry, + parsed_index_cls=ParsedIndex, ) @@ -716,6 +342,20 @@ def get_link_update_entries(self) -> List[IndexEntry]: updates.append(entry) return updates + def get_reordered_entries(self) -> List[IndexEntry]: + reordered: List[IndexEntry] = [] + for section_path in self.section_order: + section = self.sections.get(section_path) + if not section: + continue + for entry in section.current_entries.values(): + if entry.entry_status == "reordered": + reordered.append(entry) + for entry in section.expected_entries.values(): + if entry.entry_status == "reordered": + reordered.append(entry) + return reordered + def get_unresolved_entries(self) -> List[IndexEntry]: unresolved: List[IndexEntry] = [] for section_path in self.unsorted_paths: @@ -754,119 +394,20 @@ def sync_expected_descriptions(self) -> None: entry.description_lines = list(description_map[entry.name]) def render_full_tree(self) -> List[str]: - lines: List[str] = [] - for section_path in self.section_order: - section = self.sections.get(section_path) - if section is None: - continue - lines.append(section_path) - entries: Dict[str, IndexEntry] = dict[str, IndexEntry](section.expected_entries) - for name, entry in section.current_entries.items(): - if name in entries: - continue - if entry.entry_status in ("orphaned", "removed"): - entries[name] = entry - for name in sorted(entries.keys()): - entry = entries[name] - marker = "" - if entry.entry_status == "added": - marker = " [ADDED]" - elif entry.entry_status == "moved": - marker = " [MOVED]" - elif entry.entry_status == "unresolved": - marker = " [UNRESOLVED]" - elif entry.entry_status == "orphaned": - marker = " [ORPHANED]" - elif entry.entry_status == "removed": - marker = " [REMOVED]" - lines.append(f"- {name}{marker}") - lines.append("") - return lines + return _index_utils_rendering.render_full_tree(self) def to_markdown(self) -> str: - lines: List[str] = [] - title = self.title.strip() if self.title else "" - if title: - lines.append(f"# {title}") - lines.append("") - - toc_lines = self._render_toc() - if toc_lines: - lines.extend(toc_lines) - lines.append("") - - if self.overview: - lines.extend(self._render_prose_section(self.overview)) - - for section_path in self.section_order: - section = self.sections.get(section_path) - if section is None: - continue - lines.extend(self._render_section_markdown(section)) - - return "\n".join(lines).rstrip() + "\n" + return _index_utils_rendering.index_to_markdown(self) def _render_toc(self) -> List[str]: - toc_lines: List[str] = [] - if self.overview: - label = self._format_prose_heading(self.overview) - if label: - toc_lines.append( - f"- [{label}]({generate_anchor_from_heading(label, include_hash=True)})" - ) - for section_path in self.section_order: - section = self.sections.get(section_path) - if section is None: - continue - label = section.heading_label() - indent = max(section.heading_level - 2, 0) * 2 - prefix = " " * indent - anchor = generate_anchor_from_heading(label, include_hash=True) - toc_lines.append(f"{prefix}- [{label}]({anchor})") - return toc_lines + return _index_utils_rendering.render_toc(self) def _render_prose_section(self, section: ProseSection) -> List[str]: - lines: List[str] = [] - heading_label = self._format_prose_heading(section) - if heading_label: - lines.append(f"{'#' * section.heading_level} {heading_label}") - lines.append("") - if section.content: - lines.extend(section.content.splitlines()) - lines.append("") - for child in section.child_sections: - lines.extend(self._render_prose_section(child)) - return lines + return _index_utils_rendering.render_prose_section(self, section) @staticmethod def _format_prose_heading(section: ProseSection) -> str: - if section.heading_num: - if section.heading_level == 2: - return f"{section.heading_num}. {section.heading_str}" - return f"{section.heading_num} {section.heading_str}" - return section.heading_str + return _index_utils_rendering.format_prose_heading(section) def _render_section_markdown(self, section: IndexSection) -> List[str]: - lines = [f"{'#' * section.heading_level} {section.heading_label()}"] - lines.append("") - for entry in section.expected_entries.values(): - current_entry = self.find_current_entry(entry.name) - raw_name = entry.raw_name - link_text = entry.link_text - link_target = entry.link_target() - if current_entry: - raw_name = current_entry.raw_name or raw_name - link_text = current_entry.link_text or link_text - link_target = current_entry.link_target() - if current_entry.needs_link_update: - link_target = entry.link_target() - link_label = link_text or "Spec" - lines.append(f"- **`{raw_name}`** - [{link_label}]({link_target})") - if entry.description_lines: - for desc_line in entry.description_lines: - if desc_line.startswith("CONT: "): - lines.append(f" {desc_line[len('CONT: '):]}") - else: - lines.append(f" - {desc_line}") - lines.append("") - return lines + return _index_utils_rendering.render_section_markdown(self, section) diff --git a/scripts/lib/_index_utils_parsing.py b/scripts/lib/_index_utils_parsing.py new file mode 100644 index 00000000..a1d1b415 --- /dev/null +++ b/scripts/lib/_index_utils_parsing.py @@ -0,0 +1,456 @@ +from __future__ import annotations + +import re +from typing import Dict, List, Optional, TYPE_CHECKING, Type + +from lib._go_code_utils import normalize_generic_name +from lib._validation_utils import ProseSection + +if TYPE_CHECKING: + from lib._index_utils import IndexEntry, IndexSection, ParsedIndex + +_INDEX_HEADING_RE = re.compile(r"^(#{1,6})\s+(.+)$") +_INDEX_SECTION_NUMBER_RE = re.compile(r"^(\d+(?:\.\d+)*)(?:\.)?\s+(.+)$") +_INDEX_ENTRY_RE = re.compile(r"^\s*-\s+\*\*`([^`]+)`\*\*") +_INDEX_LINK_RE = re.compile(r"\[([^\]]+)\]\(([^)#]+)(?:#([^)]+))?\)") +_TITLE_HEADING_RE = re.compile(r"^#\s+(.+)$") + + +def _is_entry_or_heading_line(stripped: str, entry_pattern: str) -> bool: + if re.match(entry_pattern, stripped): + return True + return bool(re.match(r"^##+\s+", stripped)) + + +def _is_description_line(stripped: str) -> bool: + return stripped.startswith(" ") or stripped.startswith(" ") + + +def _collect_description_lines( + lines: list[str], + entry_line_num: int, + entry_pattern: str, +) -> tuple[list[str], Optional[int], Optional[int]]: + description_lines: List[str] = [] + description_start_line = None + description_end_line = None + + i = entry_line_num + while i < len(lines): + stripped = lines[i].rstrip() + + if _is_entry_or_heading_line(stripped, entry_pattern): + break + if stripped and not _is_description_line(stripped): + if not stripped.strip(): + i += 1 + continue + break + + if stripped.startswith(" - ") or stripped.startswith(" - "): + if description_start_line is None: + description_start_line = i + 1 + bullet_text = stripped.split("- ", 1)[1].strip() + if bullet_text: + description_lines.append(bullet_text) + description_end_line = i + 1 + elif _is_description_line(stripped) and description_lines: + # Continuation of previous bullet line. + description_lines[-1] += " " + stripped.lstrip().strip() + description_end_line = i + 1 + + i += 1 + + return description_lines, description_start_line, description_end_line + + +def parse_entry_descriptions( + index_content: str, + entry_line_numbers: Dict[str, int], +) -> Dict[str, tuple[bool, Optional[str], Optional[int]]]: + """ + Parse descriptive text (indented bullets) for each index entry. + """ + lines = index_content.split("\n") + descriptions: Dict[str, tuple[bool, Optional[str], Optional[int]]] = {} + entry_pattern = r"^\s*-\s+\*\*`([^`]+)`\*\*" + + line_to_entry: Dict[int, str] = { + line_num: name for name, line_num in entry_line_numbers.items() + } + + for line_num, _line in enumerate(lines, 1): + if line_num not in line_to_entry: + continue + + entry_name = line_to_entry[line_num] + ( + description_lines, + description_start_line, + _description_end_line, + ) = _collect_description_lines(lines, line_num, entry_pattern) + + if description_lines: + description_text = " ".join(description_lines).strip() + has_description = len(description_text) >= 20 + descriptions[entry_name] = ( + has_description, + description_text if has_description else None, + description_start_line, + ) + else: + descriptions[entry_name] = (False, None, None) + + return descriptions + + +def _parse_overview_section( + lines: list[str], + end_line: int, +) -> Optional[ProseSection]: + headings: List[tuple[int, str, int]] = [] + for line_num, line in enumerate(lines[:end_line], 1): + match = _INDEX_HEADING_RE.match(line) + if not match: + continue + level = len(match.group(1)) + heading_text = match.group(2).strip() + if level < 2: + continue + headings.append((level, heading_text, line_num)) + + if not headings: + return None + + root: Optional[ProseSection] = None + stack: List[ProseSection] = [] + nodes_by_line: Dict[int, ProseSection] = {} + + def _split_heading(heading_text: str) -> tuple[str, Optional[str]]: + num_match = _INDEX_SECTION_NUMBER_RE.match(heading_text) + if not num_match: + return heading_text, None + return num_match.group(2).strip(), num_match.group(1).strip() + + for level, heading_text, line_num in headings: + heading_str, heading_num = _split_heading(heading_text) + node = ProseSection( + heading_str=heading_str, + heading_num=heading_num, + heading_level=level, + heading_line=line_num, + content="", + parent_section=None, + child_sections=[], + has_code=False, + code_blocks=[], + ) + while stack and stack[-1].heading_level >= level: + stack.pop() + if stack: + node.parent_section = stack[-1] + stack[-1].child_sections.append(node) + else: + root = node + stack.append(node) + nodes_by_line[line_num] = node + + for idx, (_level, _heading_text, line_num) in enumerate(headings): + node_line_start = line_num + 1 + node_line_end = end_line if idx + 1 >= len(headings) else headings[idx + 1][2] - 1 + node_lines = lines[node_line_start - 1:node_line_end] + content = "\n".join(node_lines).strip() + node = nodes_by_line.get(line_num) + if node is None: + continue + node.content = content + node.lines = (node_line_start, node_line_end) + + in_code = False + code_start = None + for offset, line in enumerate(node_lines, node_line_start): + stripped = line.strip() + if stripped.startswith("```"): + if not in_code: + in_code = True + code_start = offset + else: + in_code = False + code_type = stripped[3:].strip().split()[0] if stripped[3:].strip() else "" + node.code_blocks.append((code_start or offset, offset, code_type)) + node.has_code = True + + return root + + +def _extract_title(lines: list[str]) -> str: + for line in lines: + title_match = _TITLE_HEADING_RE.match(line) + if title_match: + return title_match.group(1).strip() + return "" + + +def _find_first_def_section_line(lines: list[str]) -> int: + for line_num, line in enumerate(lines, 1): + heading = _parse_heading_line(line) + if not heading: + continue + section_number, _heading_text, heading_level = heading + if heading_level not in (2, 3, 4): + continue + if section_number.split(".")[0] == "0": + continue + return line_num + return len(lines) + + +def _parse_heading_line(line: str) -> Optional[tuple[str, str, int]]: + heading_match = _INDEX_HEADING_RE.match(line) + if not heading_match: + return None + hashes = heading_match.group(1) + heading_level = len(hashes) + if heading_level not in (2, 3, 4): + return None + raw_heading = heading_match.group(2).strip() + n = _INDEX_SECTION_NUMBER_RE.match(raw_heading) + if not n: + return None + section_number = n.group(1).strip() + heading_text = n.group(2).strip() + return section_number, heading_text, heading_level + + +def _update_heading_context( + heading_level: int, + node: "IndexSection", + current_h2: Optional["IndexSection"], + current_h3: Optional["IndexSection"], +) -> tuple[Optional["IndexSection"], Optional["IndexSection"], Optional["IndexSection"]]: + if heading_level == 2: + return node, None, None + if heading_level == 3: + return current_h2, node, None + return current_h2, current_h3, node + + +def _parse_sections( + lines: list[str], + index_section_cls: Type["IndexSection"], +) -> tuple[ + Dict[str, "IndexSection"], + List[str], + Dict[str, List[int]], + Dict[int, "IndexSection"], +]: + sections: Dict[str, "IndexSection"] = {} + section_order: List[str] = [] + section_path_lines: Dict[str, List[int]] = {} + nodes_by_line: Dict[int, IndexSection] = {} + + current_h2: Optional[IndexSection] = None + current_h3: Optional[IndexSection] = None + + for line_num, line in enumerate(lines, 1): + heading = _parse_heading_line(line) + if not heading: + continue + section_number, heading_text, heading_level = heading + if section_number.split(".")[0] == "0": + continue + parent = None + if heading_level == 3: + parent = current_h2 + elif heading_level == 4: + parent = current_h3 + if parent is None and heading_level != 2: + continue + + heading_kind = index_section_cls.derive_heading_kind(heading_text) + node = index_section_cls( + section_number=section_number, + heading_level=heading_level, + parent_heading=parent, + heading_text=heading_text, + kind=heading_kind, + ) + if parent is not None: + parent.add_child(node) + + section_path = node.path_label() + section_path_lines.setdefault(section_path, []).append(line_num) + nodes_by_line[line_num] = node + if section_path in sections: + continue + + sections[section_path] = node + section_order.append(section_path) + + current_h2, current_h3, _current_h4 = _update_heading_context( + heading_level, + node, + current_h2, + current_h3, + ) + + return sections, section_order, section_path_lines, nodes_by_line + + +def _parse_entries( + lines: list[str], + nodes_by_line: Dict[int, "IndexSection"], + index_entry_cls: Type["IndexEntry"], +) -> None: + current_h2: Optional["IndexSection"] = None + current_h3: Optional["IndexSection"] = None + current_h4: Optional["IndexSection"] = None + + for line_num, line in enumerate(lines, 1): + if line_num in nodes_by_line: + node = nodes_by_line[line_num] + current_h2, current_h3, current_h4 = _update_heading_context( + node.heading_level, + node, + current_h2, + current_h3, + ) + continue + + entry_match = _INDEX_ENTRY_RE.match(line) + if not entry_match: + continue + + raw_name = entry_match.group(1) + name = normalize_generic_name(raw_name) + + section_node = current_h4 or current_h3 or current_h2 + if section_node is None: + continue + + link_match = _INDEX_LINK_RE.search(line) + link_text = "" + link_file = "" + link_anchor: Optional[str] = None + if link_match: + link_text = link_match.group(1).strip() + link_file = link_match.group(2).strip() + anchor = link_match.group(3) + link_anchor = anchor.strip() if anchor else None + + entry = index_entry_cls( + name=name, + raw_name=raw_name, + current_section=section_node.path_label(), + link_text=link_text, + link_file=link_file, + link_anchor=link_anchor, + line_number=line_num, + kind=section_node.kind, + ) + section_node.add_entry(entry) + section_node.current_entries[name] = entry + + +def _validate_unique_headings(section_path_lines: Dict[str, List[int]]) -> None: + duplicates = {path: lines for path, lines in section_path_lines.items() if len(lines) > 1} + if not duplicates: + return + details = ", ".join( + f"{path} (lines {', '.join(str(line_num) for line_num in dup_lines)})" + for path, dup_lines in sorted(duplicates.items()) + ) + raise ValueError(f"Duplicate headings detected in index file: {details}") + + +def _populate_entry_description_fields( + lines: list[str], + section_order: List[str], + sections: Dict[str, "IndexSection"], +) -> None: + for section_path in section_order: + section = sections[section_path] + for entry in section.entries: + desc_lines, desc_start, desc_end = _collect_description_lines( + lines, + entry.line_number, + r"^\s*-\s+\*\*`([^`]+)`\*\*", + ) + entry.description_lines = desc_lines + entry.description_line_start = desc_start + entry.description_line_end = desc_end + if desc_lines: + desc_text = " ".join(desc_lines).strip() + entry.description_text = desc_text + entry.has_description = len(desc_text) >= 20 + + +def _add_unsorted_sections( + sections: Dict[str, "IndexSection"], + index_section_cls: Type["IndexSection"], +) -> list[str]: + unsorted_types = index_section_cls( + section_number="0", + heading_level=2, + parent_heading=None, + heading_text="Unsorted Types", + kind="type", + ) + unsorted_methods = index_section_cls( + section_number="0", + heading_level=2, + parent_heading=None, + heading_text="Unsorted Methods", + kind="method", + ) + unsorted_funcs = index_section_cls( + section_number="0", + heading_level=2, + parent_heading=None, + heading_text="Unsorted Functions", + kind="func", + ) + unsorted_paths = [ + unsorted_types.path_label(), + unsorted_methods.path_label(), + unsorted_funcs.path_label(), + ] + sections[unsorted_paths[0]] = unsorted_types + sections[unsorted_paths[1]] = unsorted_methods + sections[unsorted_paths[2]] = unsorted_funcs + return unsorted_paths + + +def parse_index( + index_content: str, + *, + index_section_cls: Type["IndexSection"], + index_entry_cls: Type["IndexEntry"], + parsed_index_cls: Type["ParsedIndex"], +) -> "ParsedIndex": + """ + Parse docs/tech_specs/api_go_defs_index.md into structured sections and entries. + """ + lines = index_content.split("\n") + title = _extract_title(lines) + first_def_section_line = _find_first_def_section_line(lines) + overview = _parse_overview_section(lines, first_def_section_line - 1) + if overview: + overview.file_path = "api_go_defs_index.md" + + sections, section_order, section_path_lines, nodes_by_line = _parse_sections( + lines, + index_section_cls, + ) + _parse_entries(lines, nodes_by_line, index_entry_cls) + _validate_unique_headings(section_path_lines) + _populate_entry_description_fields(lines, section_order, sections) + unsorted_paths = _add_unsorted_sections(sections, index_section_cls) + + return parsed_index_cls( + sections=sections, + section_order=section_order, + overview=overview, + unsorted_paths=unsorted_paths, + title=title, + ) diff --git a/scripts/lib/_index_utils_rendering.py b/scripts/lib/_index_utils_rendering.py new file mode 100644 index 00000000..95e47298 --- /dev/null +++ b/scripts/lib/_index_utils_rendering.py @@ -0,0 +1,129 @@ +from __future__ import annotations + +from typing import List + +from lib._validation_utils import ProseSection, generate_anchor_from_heading + + +def format_prose_heading(section: ProseSection) -> str: + if section.heading_num: + if section.heading_level == 2: + return f"{section.heading_num}. {section.heading_str}" + return f"{section.heading_num} {section.heading_str}" + return section.heading_str + + +def render_toc(parsed_index) -> List[str]: + toc_lines: List[str] = [] + if parsed_index.overview: + label = format_prose_heading(parsed_index.overview) + if label: + toc_lines.append( + f"- [{label}]({generate_anchor_from_heading(label, include_hash=True)})" + ) + for section_path in parsed_index.section_order: + section = parsed_index.sections.get(section_path) + if section is None: + continue + label = section.heading_label() + indent = max(section.heading_level - 2, 0) * 2 + prefix = " " * indent + anchor = generate_anchor_from_heading(label, include_hash=True) + toc_lines.append(f"{prefix}- [{label}]({anchor})") + return toc_lines + + +def render_prose_section(parsed_index, section: ProseSection) -> List[str]: + lines: List[str] = [] + heading_label = format_prose_heading(section) + if heading_label: + lines.append(f"{'#' * section.heading_level} {heading_label}") + lines.append("") + if section.content: + lines.extend(section.content.splitlines()) + lines.append("") + for child in section.child_sections: + lines.extend(render_prose_section(parsed_index, child)) + return lines + + +def render_section_markdown(parsed_index, section) -> List[str]: + lines = [f"{'#' * section.heading_level} {section.heading_label()}"] + lines.append("") + for entry in section.expected_entries.values(): + current_entry = parsed_index.find_current_entry(entry.name) + raw_name = entry.raw_name + link_text = entry.link_text + link_target = entry.link_target() + if current_entry: + raw_name = current_entry.raw_name or raw_name + link_text = current_entry.link_text or link_text + link_target = current_entry.link_target() + if current_entry.needs_link_update: + link_target = entry.link_target() + link_label = link_text or "Spec" + lines.append(f"- **`{raw_name}`** - [{link_label}]({link_target})") + if entry.description_lines: + for desc_line in entry.description_lines: + if desc_line.startswith("CONT: "): + lines.append(f" {desc_line[len('CONT: '):]}") + else: + lines.append(f" - {desc_line}") + lines.append("") + return lines + + +def index_to_markdown(parsed_index) -> str: + lines: List[str] = [] + title = parsed_index.title.strip() if parsed_index.title else "" + if title: + lines.append(f"# {title}") + lines.append("") + + toc_lines = render_toc(parsed_index) + if toc_lines: + lines.extend(toc_lines) + lines.append("") + + if parsed_index.overview: + lines.extend(render_prose_section(parsed_index, parsed_index.overview)) + + for section_path in parsed_index.section_order: + section = parsed_index.sections.get(section_path) + if section is None: + continue + lines.extend(render_section_markdown(parsed_index, section)) + + return "\n".join(lines).rstrip() + "\n" + + +def render_full_tree(parsed_index) -> List[str]: + lines: List[str] = [] + for section_path in parsed_index.section_order: + section = parsed_index.sections.get(section_path) + if section is None: + continue + lines.append(section_path) + entries = dict[str, object](section.expected_entries) + for name, entry in section.current_entries.items(): + if name in entries: + continue + if entry.entry_status in ("orphaned", "removed"): + entries[name] = entry + for entry in sorted(entries.values(), key=lambda item: item.sort_key()): + marker = "" + if entry.entry_status == "added": + marker = " [ADDED]" + elif entry.entry_status == "moved": + marker = " [MOVED]" + elif entry.entry_status == "reordered": + marker = " [REORDERED]" + elif entry.entry_status == "unresolved": + marker = " [UNRESOLVED]" + elif entry.entry_status == "orphaned": + marker = " [ORPHANED]" + elif entry.entry_status == "removed": + marker = " [REMOVED]" + lines.append(f"- {entry.name}{marker}") + lines.append("") + return lines diff --git a/scripts/lib/_validate_go_code_blocks_report.py b/scripts/lib/_validate_go_code_blocks_report.py new file mode 100644 index 00000000..37a6adc9 --- /dev/null +++ b/scripts/lib/_validate_go_code_blocks_report.py @@ -0,0 +1,307 @@ +"""Report generation helpers for Go code blocks validation.""" + +from collections import defaultdict +from pathlib import Path +from typing import Dict, List + +from lib._validation_utils import ( + OutputBuilder, + ValidationIssue, + format_issue_message, +) + + +def _append_issues_found_section( + report_lines: List[str], results: List[Dict] +) -> None: + """Append '## Issues Found' and grouped issues to report_lines.""" + report_lines.append('## Issues Found') + report_lines.append('') + + issues_by_type = defaultdict(list) + for result in results: + for issue in result['issues']: + if isinstance(issue, ValidationIssue): + issue = issue.to_dict() + issues_by_type[issue['type']].append((result['file'], issue)) + + for issue_type, issues in sorted(issues_by_type.items()): + report_lines.append(f'### {issue_type.replace("_", " ").title()} Issues') + report_lines.append('') + + for file_path, issue in issues: + report_lines.append(f'**File:** `{file_path}`') + if 'start_line' in issue: + report_lines.append(f'**Lines:** {issue["start_line"]}-{issue["end_line"]}') + if 'heading' in issue: + report_lines.append(f'**Heading:** {issue["heading"]}') + if 'type_count' in issue: + report_lines.append(f'**Type definitions found:** {issue["type_count"]}') + if 'func_count' in issue: + report_lines.append(f'**Func definitions found:** {issue["func_count"]}') + if 'func_type_count' in issue: + report_lines.append( + f'**Function type definitions found:** ' + f'{issue["func_type_count"]}' + ) + if 'blocks' in issue: + block_info = ', '.join(f'lines {s}-{e}' for s, e in issue['blocks']) + report_lines.append(f'**Code blocks:** {block_info}') + if 'def_name' in issue: + report_lines.append(f'**Definition:** {issue["def_name"]}') + if 'def_kind' in issue: + report_lines.append(f'**Definition kind:** {issue["def_kind"]}') + report_lines.append(f'**Issue:** {issue["message"]}') + report_lines.append('') + + +def _append_detailed_breakdown_section( + report_lines: List[str], results: List[Dict] +) -> None: + """Append '## Detailed File Breakdown' to report_lines.""" + report_lines.append('## Detailed File Breakdown') + report_lines.append('') + + for result in sorted(results, key=lambda x: x['file']): + if not (result['code_blocks'] or result['issues']): + continue + file_name = Path(result["file"]).stem + report_lines.append(f'### {file_name}') + report_lines.append('') + report_lines.append(f'**File path:** `{result["file"]}`') + report_lines.append(f'**Code blocks:** {len(result["code_blocks"])}') + report_lines.append(f'**Issues:** {len(result["issues"])}') + report_lines.append('') + + if result['code_blocks']: + report_lines.append(f'#### {file_name} Code Blocks') + report_lines.append('') + for i, block in enumerate(result['code_blocks'], 1): + report_lines.append( + f'Code block {i}: Lines ' + f'{block["start_line"]}-{block["end_line"]}' + ) + report_lines.append('') + report_lines.append(f'- Heading: {block["heading"] or "(none)"}') + report_lines.append(f'- Type definitions: {block["type_count"]}') + report_lines.append(f'- Func definitions: {block["func_count"]}') + if block.get("func_type_count", 0) > 0: + report_lines.append( + f'- Function type definitions: ' + f'{block["func_type_count"]}' + ) + report_lines.append('') + + if result['issues']: + report_lines.append(f'#### {file_name} Issues') + report_lines.append('') + for issue in result['issues']: + issue_dict = issue.to_dict() if isinstance(issue, ValidationIssue) else issue + report_lines.append(f'- {issue_dict["message"]}') + if 'start_line' in issue_dict: + report_lines.append( + f' - Lines: {issue_dict["start_line"]}-' + f'{issue_dict["end_line"]}' + ) + report_lines.append('') + + +def generate_report(results: List[Dict], output_path: Path) -> None: + """Generate markdown report from audit results.""" + report_lines = [] + + report_lines.append('# Go Code Blocks Validation Report') + report_lines.append('') + report_lines.append('This report validates all Go code blocks in the tech specs documentation.') + report_lines.append('') + report_lines.append('## Requirements') + report_lines.append('') + report_lines.append( + '1. Each Go code block should have at most one type or interface ' + 'definition' + ) + report_lines.append( + '2. Each Go code block should have at most one function or method ' + 'definition' + ) + report_lines.append( + '3. Type definitions and function definitions are mutually exclusive ' + 'in a code block' + ) + report_lines.append('4. Each Go code block should be under a different heading') + report_lines.append( + '5. Headings for Go definitions should include the definition name and kind word; ' + 'definition names are preferred in backticks (e.g. `` `Package.Write` Method ``). ' + 'Case inside backticks is ignored for validation.' + ) + report_lines.append( + '6. All type, interface, struct, function, and method definitions should have ' + 'comments preceding them' + ) + report_lines.append('') + + total_files = len(results) + total_blocks = sum(len(r['code_blocks']) for r in results) + total_issues = sum(len(r['issues']) for r in results) + + report_lines.append('## Summary') + report_lines.append('') + report_lines.append(f'- Files audited: {total_files}') + report_lines.append(f'- Total Go code blocks found: {total_blocks}') + report_lines.append(f'- Total issues found: {total_issues}') + report_lines.append('') + + if not total_issues: + report_lines.append('✅ All Go code blocks comply with the requirements!') + report_lines.append('') + else: + _append_issues_found_section(report_lines, results) + _append_detailed_breakdown_section(report_lines, results) + + output_path.parent.mkdir(parents=True, exist_ok=True) + output_path.write_text('\n'.join(report_lines), encoding='utf-8') + + +def _message_parts_for_issue(issue) -> List[str]: + """Build list of message part strings from an issue (ValidationIssue or dict).""" + if isinstance(issue, ValidationIssue): + issue = issue.to_dict() + parts = [] + key_labels = [ + ('heading', 'Heading'), + ('type_count', 'Type definitions'), + ('func_count', 'Func definitions'), + ('func_type_count', 'Function type definitions'), + ('func_name', 'Function/Method'), + ('receiver_type', 'Receiver'), + ('type_name', 'Type'), + ('kind', 'Kind'), + ('def_name', 'Definition'), + ('def_kind', 'Definition kind'), + ] + for key, label in key_labels: + if key in issue and issue[key]: + parts.append(f'{label}: {issue[key]}') + if 'blocks' in issue and issue['blocks']: + block_info = ', '.join(f'lines {s}-{e}' for s, e in issue['blocks']) + parts.append(f'Code blocks: {block_info}') + return parts + + +def _issues_by_type(results) -> Dict[str, int]: + """Return dict of issue_type -> count from results.""" + counts = defaultdict(int) + for result in results: + for issue in result['issues']: + it = ( + issue.issue_type if isinstance(issue, ValidationIssue) + else issue.get('type', 'unknown') + ) + counts[it] += 1 + return dict(counts) + + +WARNING_ISSUE_TYPES = ("function_type_warning", "heading_prefer_backticks") + + +def _has_non_warning_errors(results) -> bool: + """Return True if any issue is not a warning (e.g. heading_prefer_backticks).""" + return any( + (isinstance(i, ValidationIssue) and i.issue_type not in WARNING_ISSUE_TYPES) or + (not isinstance(i, ValidationIssue) and i.get('type') not in WARNING_ISSUE_TYPES) + for r in results for i in r['issues'] + ) + + +def _issues_by_type_list(results) -> Dict[str, List[tuple]]: + """Return dict of issue_type -> [(file_path, issue_dict), ...].""" + out = defaultdict(list) + for result in results: + for issue in result['issues']: + if isinstance(issue, ValidationIssue): + it, issue_dict = issue.issue_type, issue.to_dict() + else: + it, issue_dict = issue.get('type', 'unknown'), issue + out[it].append((result['file'], issue_dict)) + return dict(out) + + +def _emit_issues_section(output, results, no_color) -> None: + """Emit errors header and all issues grouped by type.""" + issues_by_type_list = _issues_by_type_list(results) + for issue_type, issues in sorted(issues_by_type_list.items()): + severity = "warning" if issue_type in WARNING_ISSUE_TYPES else "error" + # Show backtick recommendations in default output (not verbose-only) + show_by_default = severity == "warning" and issue_type == "heading_prefer_backticks" + for file_path, issue in issues: + message_parts = _message_parts_for_issue(issue) + message = issue.get('message', '') + if message_parts: + message = f"{message} ({', '.join(message_parts)})" + formatted_msg = format_issue_message( + severity=severity, + issue_type=issue_type.replace('_', ' ').title(), + file_path=file_path, + line_num=issue.get('start_line'), + message=message, + suggestion=issue.get('suggestion'), + no_color=no_color, + ) + if severity == "warning": + output.add_warning_line(formatted_msg, verbose_only=not show_by_default) + else: + output.add_error_line(formatted_msg, verbose_only=False) + + +def print_summary(results, output=None, verbose=False, no_color=False): + """ + Print summary of audit results. + + Args: + results: List of audit results + output: Optional OutputBuilder instance (creates new one if None) + verbose: Verbose mode flag + no_color: Disable colors flag + """ + if output is None: + output = OutputBuilder( + "Go Code Blocks Validation", + "Validates Go code blocks in tech specs", + no_color=no_color, + verbose=verbose, + ) + output.add_header("Go Code Blocks Validation", "Validates Go code blocks in tech specs") + + total_files = len(results) + total_blocks = sum(len(r['code_blocks']) for r in results) + total_issues = sum(len(r['issues']) for r in results) + output.add_summary_header() + output.add_summary_section([ + ("Files audited:", total_files), + ("Total code blocks:", total_blocks), + ("Total issues found:", total_issues), + ]) + + issues_by_type = _issues_by_type(results) + if issues_by_type: + output.add_blank_line("summary") + output.add_line('Breakdown by issue type:', section="summary") + output.add_summary_section([ + (t.replace('_', ' ').title() + ':', c) + for t, c in sorted(issues_by_type.items()) + ]) + if issues_by_type: + if _has_non_warning_errors(results): + output.add_errors_header() + output.add_blank_line("error") + _emit_issues_section(output, results, no_color) + + if not total_issues: + output.add_success_message("All Go code blocks comply with the requirements!") + elif _has_non_warning_errors(results): + output.add_failure_message("Validation failed. Please fix the errors above.") + else: + output.add_warnings_only_message( + message="All Go code blocks comply (see recommendations above).", + ) + return output diff --git a/scripts/lib/_validate_go_signature_sync_helpers.py b/scripts/lib/_validate_go_signature_sync_helpers.py new file mode 100644 index 00000000..35b94fd1 --- /dev/null +++ b/scripts/lib/_validate_go_signature_sync_helpers.py @@ -0,0 +1,237 @@ +"""Helpers for Go signature sync validation: report emission.""" + +import re +import sys +from typing import Dict, List, Tuple + +from lib._go_code_utils import Signature +from lib._validation_utils import OutputBuilder, format_issue_message + + +def _parse_location(location_str: str) -> Tuple[str, int]: + """Parse 'path:line' into (path, line_num).""" + if ':' in location_str: + parts = location_str.rsplit(':', 1) + try: + return parts[0], int(parts[1]) + except (ValueError, IndexError): + pass + return location_str, 0 + + +def emit_mismatches( + output: OutputBuilder, mismatches: List, *, no_color: bool = False +) -> bool: + """Emit mismatch errors; return True if any.""" + if not mismatches: + return False + output.add_errors_header() + output.add_line(f"Found {len(mismatches)} signature mismatch(es):", section="error") + output.add_blank_line("error") + for key, impl_sig, spec_sig in sorted(mismatches): + file_path, line_num = _parse_location(impl_sig.location) + msg = format_issue_message( + "error", + "signature_mismatch", + file_path, + line_num=line_num if line_num else None, + message=key, + no_color=no_color, + ) + output.add_error_line(msg) + output.add_error_line(f" Implementation: {impl_sig.normalized_signature()}") + output.add_error_line(f" Location: {impl_sig.location}") + output.add_error_line(f" Specification: {spec_sig.normalized_signature()}") + output.add_error_line(f" Location: {spec_sig.location}") + return True + + +def emit_missing_in_impl( + output: OutputBuilder, + missing_in_impl: List[str], + spec_sigs: Dict[str, Signature], + *, + no_color: bool = False, +) -> bool: + """Emit missing-in-impl warnings; return True if any.""" + if not missing_in_impl: + return False + output.add_warnings_header() + output.add_line( + f"Found {len(missing_in_impl)} signature(s) in specs but not in implementation:", + section="warning", + ) + output.add_blank_line("warning") + for key in sorted(missing_in_impl): + spec_sig = spec_sigs[key] + file_path, line_num = _parse_location(spec_sig.location) + msg = format_issue_message( + "warning", + "missing_in_implementation", + file_path, + line_num=line_num if line_num else None, + message=key, + no_color=no_color, + ) + output.add_warning_line(msg) + output.add_warning_line(f" Signature: {spec_sig.normalized_signature()}") + output.add_warning_line(f" Location: {spec_sig.location}") + return True + + +def emit_extra_in_impl_section( + output: OutputBuilder, + args, + *, + extra_in_impl: List[str], + impl_sigs: Dict[str, Signature], + high_confidence_helpers: List[Tuple[str, Signature, List[str]]], + low_confidence_extra: List[str], + errors_public_api_missing: List[Tuple[str, Signature]], + no_color: bool = False, +) -> Tuple[bool, bool]: + """Emit extra-in-impl section. Returns (has_errors, has_warnings).""" + if not extra_in_impl: + return (False, False) + has_errors = False + has_warnings = False + if high_confidence_helpers: + if args.verbose: + output.add_verbose_line( + f"Suppressed {len(high_confidence_helpers)} high-confidence helper function(s)" + ) + output.add_blank_line("working_verbose") + for key, impl_sig, reasons in sorted(high_confidence_helpers): + output.add_verbose_line(f" {key}") + output.add_verbose_line(f" Signature: {impl_sig.normalized_signature()}") + output.add_verbose_line(f" Location: {impl_sig.location}") + output.add_verbose_line(f" Reasons: {', '.join(reasons)}") + + if errors_public_api_missing: + has_errors = True + output.add_errors_header() + output.add_line( + f"Found {len(errors_public_api_missing)} public API method(s) " + "in implementation but not in specs:", + section="error", + ) + output.add_blank_line("error") + output.add_line( + "These are public methods on public API types and MUST be documented in tech specs.", + section="error", + ) + output.add_blank_line("error") + for key, impl_sig in sorted(errors_public_api_missing): + file_path, line_num = _parse_location(impl_sig.location) + msg = format_issue_message( + "error", + "public_api_not_in_spec", + file_path, + line_num=line_num if line_num else None, + message=key, + no_color=no_color, + ) + output.add_error_line(msg) + output.add_error_line(f" Implementation: {impl_sig.normalized_signature()}") + output.add_error_line(f" Location: {impl_sig.location}") + + if low_confidence_extra: + has_warnings = True + output.add_warnings_header() + output.add_line( + f"Found {len(low_confidence_extra)} signature(s) in implementation but not in specs:", + section="warning", + ) + if high_confidence_helpers: + if args.verbose: + msg = ( + f"(Suppressed {len(high_confidence_helpers)} high-confidence helper " + "function(s) - see above)" + ) + else: + msg = ( + f"(Suppressed {len(high_confidence_helpers)} high-confidence helper " + "function(s) - use --verbose to see them)" + ) + output.add_line(msg, section="warning") + if errors_public_api_missing: + output.add_line( + f"(Also found {len(errors_public_api_missing)} public API method(s) " + "missing from specs - see errors above)", + section="warning", + ) + output.add_blank_line("warning") + output.add_line("(These may be helper functions, but should be checked)", section="warning") + output.add_blank_line("warning") + for key in sorted(low_confidence_extra): + impl_sig = impl_sigs[key] + file_path, line_num = _parse_location(impl_sig.location) + msg = format_issue_message( + "warning", + "extra_in_implementation", + file_path, + line_num=line_num if line_num else None, + message=key, + no_color=no_color, + ) + output.add_warning_line(msg) + output.add_warning_line(f" Signature: {impl_sig.normalized_signature()}") + output.add_warning_line(f" Location: {impl_sig.location}") + elif high_confidence_helpers and not args.verbose: + output.add_verbose_line( + f"Suppressed {len(high_confidence_helpers)} high-confidence helper function(s)" + ) + output.add_verbose_line("(Use --verbose to see the list of suppressed helpers)") + return (has_errors, has_warnings) + + +def emit_sync_final( + output: OutputBuilder, + args, + *, + has_errors: bool, + has_warnings: bool, + impl_sigs: Dict[str, Signature], + spec_sigs: Dict[str, Signature], + mismatches: List, + missing_in_impl: List[str], + extra_in_impl: List[str], + low_confidence_count: int, + high_confidence_count: int, +) -> None: + """Emit success or failure summary and exit.""" + if not has_errors and not has_warnings: + output.add_success_message("All signatures are in sync!") + if args.verbose: + output.add_verbose_line(f" - {len(impl_sigs)} signatures in implementation") + output.add_verbose_line(f" - {len(spec_sigs)} signatures in specs") + output.print() + sys.exit(0) + summary_parts = [] + if mismatches: + summary_parts.append(f"{len(mismatches)} mismatch(es)") + if missing_in_impl: + summary_parts.append(f"{len(missing_in_impl)} missing in implementation") + if extra_in_impl: + if low_confidence_count > 0: + summary_parts.append(f"{low_confidence_count} extra in implementation") + if high_confidence_count > 0: + summary_parts.append(f"{high_confidence_count} helper(s) suppressed") + summary_items = [] + for part in summary_parts: + match = re.search(r'(\d+)\s+(.+)', part) + if match: + count = int(match.group(1)) + label = match.group(2).strip() + label = label[0].upper() + label[1:] if label else label + summary_items.append((f"{label}:", count)) + if summary_items: + output.add_summary_section(summary_items) + if has_errors: + output.add_failure_message("Validation failed. Please fix the errors above.") + else: + output.add_warnings_only_message( + verbose_hint="Run with --verbose to see the full warning details.", + ) + output.print() + sys.exit(output.get_exit_code(args.no_fail)) diff --git a/scripts/lib/_validate_go_spec_references_models.py b/scripts/lib/_validate_go_spec_references_models.py new file mode 100644 index 00000000..f9d1725f --- /dev/null +++ b/scripts/lib/_validate_go_spec_references_models.py @@ -0,0 +1,79 @@ +"""Models for Go specification reference validation.""" + +import re +from typing import Optional + + +class SpecReference: + """Represents a specification reference from a Go file.""" + + def __init__(self, file_path, line_num: int, raw_ref: str): + self.file_path = file_path + self.line_num = line_num + self.raw_ref = raw_ref + self.spec_file: Optional[str] = None + self.section: Optional[str] = None + self.heading: Optional[str] = None + self.is_valid_format = False + self.function_name: Optional[str] = None + self.suggested_ref: Optional[str] = None + self._parse() + + def _parse(self): + """Parse the raw reference into components.""" + ref = self.raw_ref.strip() + ref = re.sub(r'^\.\./', '', ref) + ref = re.sub(r'^\.\.\\', '', ref) + if '..' in ref: + return + + pattern = r'^([a-zA-Z0-9_\-]+\.md):\s+(\d+(?:\.\d+)*)\.?\s+(.+)$' + match = re.match(pattern, ref) + if match: + self.is_valid_format = True + self.spec_file = match.group(1) + self.section = match.group(2) + self.heading = match.group(3).strip() + else: + anchor_pattern = r'^([a-zA-Z0-9_\-]+\.md)#([^#\s]+)$' + anchor_match = re.match(anchor_pattern, ref) + if anchor_match: + self.spec_file = anchor_match.group(1) + anchor = anchor_match.group(2) + section_match = re.match(r'^(\d+)(?:-|$)', anchor) + if section_match: + digits_str = section_match.group(1) + if len(digits_str) == 1: + self.section = digits_str + elif len(digits_str) == 2: + self.section = f"{digits_str[0]}.{digits_str[1]}" + elif len(digits_str) == 3: + self.section = f"{digits_str[0]}.{digits_str[1]}.{digits_str[2]}" + elif len(digits_str) == 4: + self.section = ( + f"{digits_str[0]}.{digits_str[1]}." + f"{digits_str[2]}.{digits_str[3]}" + ) + return + section_pattern = r'^([a-zA-Z0-9_\-]+\.md)\s+Section\s+(\d+(?:\.\d+)*)(?:\s*-\s*(.+))?$' + section_match = re.match(section_pattern, ref) + if section_match: + self.spec_file = section_match.group(1) + self.section = section_match.group(2) + if section_match.group(3): + self.heading = section_match.group(3).strip() + return + if ':' in ref: + parts = ref.split(':', 1) + self.spec_file = parts[0].strip() + elif ref.endswith('.md'): + self.spec_file = ref.strip() + else: + md_match = re.search(r'([a-zA-Z0-9_\-]+\.md)', ref) + if md_match: + self.spec_file = md_match.group(1) + + def __repr__(self): + if self.is_valid_format: + return f"{self.spec_file}: {self.section} {self.heading}" + return self.raw_ref diff --git a/scripts/lib/_validate_go_spec_references_section_finder.py b/scripts/lib/_validate_go_spec_references_section_finder.py new file mode 100644 index 00000000..bb258c55 --- /dev/null +++ b/scripts/lib/_validate_go_spec_references_section_finder.py @@ -0,0 +1,285 @@ +"""Section and index lookup for SpecValidator (validate_go_spec_references).""" + +import re +from pathlib import Path +from typing import List, Optional, Tuple + +from lib._validation_utils import extract_h2_plus_headings_with_sections + + +class SectionFinder: + """Finds sections and correct references from index; used by SpecValidator.""" + + def __init__(self, ctx): + """ctx: object with file_cache, spec_sections, index_entries, index_anchors, + get_spec_file_path, parse_markdown_anchors, is_section_0_or_cross_reference, + clean_heading, format_section_number, validate_anchor, validate_spec_file_name, + is_safe_path, output, verbose. + """ + self._v = ctx + + def extract_function_or_type_name(self, go_file: Path, line_num: int) -> Optional[str]: + """Extract function or type name from Go code context around the specification comment.""" + try: + lines = self._v.file_cache.get_lines(go_file) + start_line = max(0, line_num - 30) + context = ''.join(lines[start_line:line_num]) + + match = re.search(r'func\s+\([^)]*(\w+)[^)]*\)\s+([A-Z][a-zA-Z0-9_]+)\s*\(', context) + if match: + receiver, method = match.group(1), match.group(2) + if receiver and receiver[0].isupper(): + return f"{receiver}.{method}" + return method + match = re.search(r'func\s+([A-Z][a-zA-Z0-9_]+)\s*\(', context) + if match: + return match.group(1) + match = re.search(r'type\s+([A-Z][a-zA-Z0-9_]+)\s+(?:struct|interface|\[|$)', context) + if match: + return match.group(1) + match = re.search(r'const\s+([A-Z][a-zA-Z0-9_]+)\s*=', context) + if match: + return match.group(1) + match = re.search(r'var\s+([A-Z][a-zA-Z0-9_]+)\s*=', context) + if match: + return match.group(1) + return None + except (IOError, OSError, UnicodeDecodeError): + return None + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: + if self._v.output and self._v.verbose: + self._v.output.add_warning_line( + f"Unexpected error extracting function name: {e}" + ) + return None + + def find_correct_reference_from_index( + self, function_name: str + ) -> Optional[Tuple[str, str, str]]: + """Find correct reference from index. Returns (spec_file, section_num, heading) or None.""" + if not function_name: + return None + exact = self._try_exact_match(function_name) + if exact: + return exact + function_base = function_name.split('.')[-1] if '.' in function_name else function_name + candidates = self._find_partial_matches(function_base) + if candidates: + candidates.sort(key=lambda x: x[0], reverse=True) + _, spec_file, section_formatted, heading_clean = candidates[0] + return (spec_file, section_formatted, heading_clean) + return None + + def _try_exact_match(self, function_name: str) -> Optional[Tuple[str, str, str]]: + if function_name not in self._v.index_entries: + return None + spec_file = self._v.index_entries[function_name] + anchor = self._v.index_anchors.get(function_name) + if anchor: + parsed = self._parse_anchor_to_section_and_heading(anchor, spec_file) + if parsed: + section_num, heading = parsed + if not self._v.is_section_0_or_cross_reference(section_num, heading): + heading_clean = self._v.clean_heading(section_num, heading) + section_formatted = self._v.format_section_number(section_num, heading) + return (spec_file, section_formatted, heading_clean) + fallback = self._find_section_for_spec_file(spec_file, function_name) + if fallback: + spec_file_fb, section_num, heading_text = fallback + if not self._v.is_section_0_or_cross_reference(section_num, heading_text): + heading_clean = self._v.clean_heading(section_num, heading_text) + section_formatted = self._v.format_section_number(section_num, heading_text) + return (spec_file_fb, section_formatted, heading_clean) + return None + + def _calculate_match_score(self, function_base: str, index_name: str) -> int: + index_base = index_name.split('.')[-1] if '.' in index_name else index_name + if function_base == index_base: + return 100 + if index_name.endswith('.' + function_base): + return 50 + return 0 + + def _create_candidate_from_anchor( + self, anchor: str, spec_file: str, score: int + ) -> Optional[Tuple[int, str, str, str]]: + parsed = self._parse_anchor_to_section_and_heading(anchor, spec_file) + if not parsed: + return None + section_num, heading = parsed + if self._v.is_section_0_or_cross_reference(section_num, heading): + return None + heading_clean = self._v.clean_heading(section_num, heading) + section_formatted = self._v.format_section_number(section_num, heading) + return (score, spec_file, section_formatted, heading_clean) + + def _create_candidate_from_fallback( + self, fallback: Tuple[str, str, str], score: int + ) -> Optional[Tuple[int, str, str, str]]: + spec_file_fb, section_num, heading_text = fallback + if self._v.is_section_0_or_cross_reference(section_num, heading_text): + return None + heading_clean = self._v.clean_heading(section_num, heading_text) + section_formatted = self._v.format_section_number(section_num, heading_text) + return (score, spec_file_fb, section_formatted, heading_clean) + + def _find_partial_matches( + self, function_base: str + ) -> List[Tuple[int, str, str, str]]: + candidates = [] + for index_name, spec_file in self._v.index_entries.items(): + score = self._calculate_match_score(function_base, index_name) + if not score: + continue + anchor = self._v.index_anchors.get(index_name) + if anchor: + candidate = self._create_candidate_from_anchor(anchor, spec_file, score) + if candidate: + candidates.append(candidate) + else: + fallback = self._find_section_for_spec_file(spec_file, index_name) + if fallback: + candidate = self._create_candidate_from_fallback(fallback, score) + if candidate: + candidates.append(candidate) + return candidates + + def _parse_anchor_to_section_and_heading( + self, anchor: str, spec_file: str + ) -> Optional[Tuple[str, str]]: + if (not anchor or not self._v.validate_anchor(anchor) + or not self._v.validate_spec_file_name(spec_file)): + return None + spec_path = self._v.get_spec_file_path(spec_file) + if not spec_path or not spec_path.exists() or not self._v.is_safe_path(spec_path): + return None + if spec_file not in self._v.spec_sections: + _, sections = self._v.parse_markdown_anchors(spec_path) + self._v.spec_sections[spec_file] = sections + result = self._find_section_for_anchor_in_cached( + self._v.spec_sections[spec_file], anchor + ) + if result is not None: + return result + return self._find_section_for_anchor_in_headings(spec_path, anchor) + + def _find_section_for_anchor_in_cached( + self, sections: dict, anchor: str + ) -> Optional[Tuple[str, str]]: + for section_num, (heading_text, heading_anchor) in sections.items(): + if (heading_anchor == anchor + and not self._v.is_section_0_or_cross_reference( + section_num, heading_text + )): + return (section_num, heading_text) + return None + + def _find_section_for_anchor_in_headings( + self, spec_path: Path, anchor: str + ) -> Optional[Tuple[str, str]]: + try: + headings = extract_h2_plus_headings_with_sections( + spec_path, file_cache=self._v.file_cache + ) + for _hl, heading_text, _ln, plain_anchor, section_anchor in headings: + if anchor not in (section_anchor, plain_anchor): + continue + section_match = re.match(r'^(\d+(?:\.\d+)*)', heading_text) + if section_match and not self._v.is_section_0_or_cross_reference( + section_match.group(1), heading_text + ): + return (section_match.group(1), heading_text) + except (IOError, OSError, UnicodeDecodeError): + pass + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: + if self._v.output and self._v.verbose: + self._v.output.add_warning_line( + f"Unexpected error parsing anchor: {e}" + ) + return None + + def _find_section_for_spec_file( + self, spec_file: str, context: str + ) -> Optional[Tuple[str, str, str]]: + spec_path = self._v.get_spec_file_path(spec_file) + if not spec_path or not spec_path.exists(): + return None + if spec_file not in self._v.spec_sections: + _, sections = self._v.parse_markdown_anchors(spec_path) + self._v.spec_sections[spec_file] = sections + sections = self._v.spec_sections[spec_file] + function_name = context.split('.')[-1] if '.' in context else context + context_lower = context.lower() + function_name_lower = function_name.lower() + context_words = set(re.findall(r'[a-zA-Z]+', context_lower)) + + best_match = None + best_score = 0 + for section_num, (heading_text, _) in sections.items(): + if self._v.is_section_0_or_cross_reference(section_num, heading_text): + continue + heading_lower = heading_text.lower() + heading_words = set(re.findall(r'[a-zA-Z]+', heading_lower)) + score = 0 + if function_name_lower in heading_lower: + score += 100 + if function_name in heading_text: + score += 50 + overlap = len(context_words & heading_words) + score += overlap * 10 + if score > best_score: + best_score = score + best_match = (spec_file, section_num, heading_text) + + if best_match and best_score >= 10: + return best_match + + try: + lines = self._v.file_cache.get_lines(spec_path) + content_match = self._find_section_in_content_by_function( + lines, spec_file, function_name, function_name_lower + ) + if content_match: + return content_match + except (IOError, OSError, UnicodeDecodeError): + pass + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: + if self._v.output and self._v.verbose: + self._v.output.add_warning_line( + f"Unexpected error finding section: {e}" + ) + + if sections: + for section_num, (heading_text, _) in sorted(sections.items()): + if not self._v.is_section_0_or_cross_reference( + section_num, heading_text + ): + return (spec_file, section_num, heading_text) + return None + + def _find_section_in_content_by_function( + self, + lines: List[str], + spec_file: str, + function_name: str, + function_name_lower: str, + ) -> Optional[Tuple[str, str, str]]: + for i, line in enumerate(lines): + if not re.match(r'^#{2,6}\s+', line): + continue + nearby = ' '.join(lines[max(0, i):min(len(lines), i + 10)]) + if (function_name not in nearby + and function_name_lower not in nearby.lower()): + continue + heading_match = re.match(r'^#{2,6}\s+(.+)', line) + if not heading_match: + continue + heading_text = heading_match.group(1).strip() + section_match = re.match(r'^(\d+(?:\.\d+)*)', heading_text) + if not section_match: + continue + section_num = section_match.group(1) + if self._v.is_section_0_or_cross_reference(section_num, heading_text): + continue + return (spec_file, section_num, heading_text) + return None diff --git a/scripts/lib/_validate_go_spec_references_validator.py b/scripts/lib/_validate_go_spec_references_validator.py new file mode 100644 index 00000000..1f5d3092 --- /dev/null +++ b/scripts/lib/_validate_go_spec_references_validator.py @@ -0,0 +1,566 @@ +"""SpecValidator for validate_go_spec_references.py.""" + +import re +from pathlib import Path +from typing import Dict, List, Optional, Set, Tuple + +from lib._validation_utils import ( + format_issue_message, + ValidationIssue, + is_safe_path, + validate_spec_file_name, + validate_anchor, + extract_headings_with_section_numbers, + FileContentCache, + DOCS_DIR, + TECH_SPECS_DIR, +) +from lib._validate_go_spec_references_models import SpecReference +from lib._validate_go_spec_references_section_finder import SectionFinder + + +class _SpecValidatorContext: + """Public adapter for SectionFinder (avoids protected-access).""" + + def __init__(self, validator): + self._validator = validator + self.file_cache = validator.file_cache + self.spec_sections = validator.spec_sections + self.index_entries = validator.index_entries + self.index_anchors = validator.index_anchors + + @property + def output(self): + return self._validator.output_for_context + + @property + def verbose(self): + return self._validator.verbose + + def get_spec_file_path(self, spec_file): + return self._validator.get_spec_file_path(spec_file) + + def parse_markdown_anchors(self, file_path): + return self._validator.parse_markdown_anchors(file_path) + + def is_section_0_or_cross_reference(self, section_num, heading_text=""): + return self._validator.is_section_0_or_cross_reference( + section_num, heading_text + ) + + def clean_heading(self, section_num, heading_text): + return self._validator.clean_heading(section_num, heading_text) + + def format_section_number(self, section_num, heading_text): + return self._validator.format_section_number(section_num, heading_text) + + def validate_anchor(self, anchor): + return self._validator.validate_anchor(anchor) + + def validate_spec_file_name(self, spec_file): + return self._validator.validate_spec_file_name(spec_file) + + def is_safe_path(self, file_path): + return self._validator.is_safe_path(file_path) + + +class SpecValidator: + """Validates specification references.""" + + def __init__(self, repo_root: Path): + self.repo_root = repo_root + self.docs_dir = repo_root / DOCS_DIR / TECH_SPECS_DIR + self.api_go_dir = repo_root / "api" / "go" + self.index_file = self.docs_dir / "api_go_defs_index.md" + self.verbose = False + self.issues: List[ValidationIssue] = [] + self._output = None # Set in validate_all; used for warnings + + # File content cache to avoid repeated reads + self.file_cache = FileContentCache() + + # Cache of parsed spec files (file -> set of anchors) + self.spec_anchors: Dict[str, Set[str]] = {} + # file -> {section_num: (heading_text, anchor)} + self.spec_sections: Dict[ + str, Dict[str, Tuple[str, str]] + ] = {} + + # Index file (loaded on first validate_all when output is available) + self.index_entries: Dict[str, str] = {} # method/type -> spec_file + self.index_link_texts: Dict[str, str] = {} # method/type -> link text for context + self.index_anchors: Dict[str, str] = {} # method/type -> anchor (e.g., "11-hashtype-type") + self._index_loaded = False + self._section_finder = SectionFinder(_SpecValidatorContext(self)) + + def _is_section_0_or_cross_reference(self, section_num: str, heading_text: str = "") -> bool: + """Check if a section is section 0 or a cross-reference section (not source of truth).""" + # Section 0 or sections starting with "0." are not source of truth + if section_num == "0" or section_num.startswith("0."): + return True + # Check if heading contains cross-reference keywords + if heading_text: + heading_lower = heading_text.lower() + if "cross-reference" in heading_lower or "cross-references" in heading_lower: + return True + if "overview" in heading_lower and section_num.startswith("0."): + return True + return False + + def _clean_heading(self, section_num: str, heading_text: str) -> str: + """Remove section number from heading if present, handling edge cases.""" + # Special case: if section is "0" and heading starts with "0. ", return just the text after + if section_num == "0": + if heading_text.startswith("0. "): + return heading_text[3:] + if heading_text.startswith("0 "): + return heading_text[2:] + + # Remove section number prefix (e.g., "2.1 AddFile" -> "AddFile") + # Match the exact section number at the start + section_pattern = re.escape(section_num) + r'(?:\.\s+|\s+)' + heading_clean = re.sub(r'^' + section_pattern, '', heading_text) + + # If that didn't work, try generic pattern + if heading_clean == heading_text: + heading_clean = re.sub(r'^\d+(?:\.\d+)*\s+', '', heading_text) + + return heading_clean + + def _format_section_number( + self, section_num: str, _heading_text: str + ) -> str: + """Format section number for reference strings.""" + return section_num + + def _ensure_index_loaded(self, output=None): + """Load index once; emit warnings via output if provided.""" + if self._index_loaded: + return + self._index_loaded = True + self._load_index(output) + + def _load_index(self, output=None): + """Load api_go_defs_index.md and extract method/type -> spec_file mappings with anchors.""" + if not self.index_file.exists(): + warning_msg = format_issue_message( + "warning", + "Index file not found", + str(self.index_file), + message="skipping index validation", + no_color=output.no_color if output else False + ) + if output: + output.add_warning_line(warning_msg) + else: + print(warning_msg) # noqa: T201 + return + + # Verify index file is within repo + if not self._is_safe_path(self.index_file): + warning_msg = format_issue_message( + "warning", + "Index file path unsafe", + str(self.index_file), + message="skipping index validation", + no_color=output.no_color if output else False + ) + if output: + output.add_warning_line(warning_msg) + else: + print(warning_msg) # noqa: T201 + return + + content = self.file_cache.get_content(self.index_file) + + # Pattern: **`Package.AddFile`** - [File Management API - AddFile] + pattern = r'\*\*`([^`]+)`\*\*\s*-\s*\[([^\]]+)\]\(([^)]+)\)' + for match in re.finditer(pattern, content): + method_type = match.group(1) + link_text = match.group(2) + link_target = match.group(3) + + if '#' in link_target: + spec_file, anchor = link_target.split('#', 1) + if not self._validate_anchor(anchor): + continue + self.index_anchors[method_type] = anchor + else: + spec_file = link_target + self.index_anchors[method_type] = None + + if not self._validate_spec_file_name(spec_file): + continue + + self.index_entries[method_type] = spec_file + self.index_link_texts[method_type] = link_text + + def _parse_markdown_anchors( + self, file_path: Path + ) -> Tuple[Set[str], Dict[str, Tuple[str, str]]]: + """Parse markdown file to extract all heading anchors and section numbers.""" + return extract_headings_with_section_numbers( + file_path, min_level=2, max_level=6, file_cache=self.file_cache + ) + + def _is_safe_path(self, file_path: Path) -> bool: + """Check if a path is safe (within repo and no traversal).""" + return is_safe_path(file_path, self.repo_root) + + def _validate_spec_file_name(self, spec_file: str) -> bool: + """Validate that spec file name is safe.""" + return validate_spec_file_name(spec_file) + + def _validate_anchor(self, anchor: str) -> bool: + """Validate that anchor is safe.""" + return validate_anchor(anchor) + + # Public API for _SpecValidatorContext / SectionFinder + @property + def output_for_context(self): + return self._output + + def get_spec_file_path(self, spec_file: str) -> Optional[Path]: + return self._get_spec_file_path(spec_file) + + def parse_markdown_anchors( + self, file_path: Path + ) -> Tuple[Set[str], Dict[str, Tuple[str, str]]]: + return self._parse_markdown_anchors(file_path) + + def is_section_0_or_cross_reference( + self, section_num: str, heading_text: str = "" + ) -> bool: + return self._is_section_0_or_cross_reference(section_num, heading_text) + + def clean_heading(self, section_num: str, heading_text: str) -> str: + return self._clean_heading(section_num, heading_text) + + def format_section_number(self, section_num: str, heading_text: str) -> str: + return self._format_section_number(section_num, heading_text) + + def validate_anchor(self, anchor: str) -> bool: + return self._validate_anchor(anchor) + + def validate_spec_file_name(self, spec_file: str) -> bool: + return self._validate_spec_file_name(spec_file) + + def is_safe_path(self, file_path: Path) -> bool: + return self._is_safe_path(file_path) + + def _get_spec_file_path(self, spec_file: str) -> Optional[Path]: + """Get the full path to a spec file with security validation.""" + if not self._validate_spec_file_name(spec_file): + return None + file_path = self.docs_dir / spec_file + if not self._is_safe_path(file_path): + return None + return file_path + + def _compute_suggested_ref_from_sections( + self, ref: SpecReference, sections: Dict[str, Tuple[str, str]] + ) -> Optional[str]: + """Compute suggested_ref from cached sections (section in or similar).""" + if ref.section in sections: + heading_text, _ = sections[ref.section] + if self._is_section_0_or_cross_reference(ref.section, heading_text): + similar = [ + s for s in sections.keys() + if not self._is_section_0_or_cross_reference(s, sections[s][0]) + and (ref.section.startswith(s) or s.startswith(ref.section)) + ] + if not similar: + return None + section_key = similar[0] + else: + section_key = ref.section + else: + similar = [ + s for s in sections.keys() + if not self._is_section_0_or_cross_reference(s, sections[s][0]) + and (ref.section.startswith(s) or s.startswith(ref.section)) + ] + if not similar: + return None + section_key = similar[0] + actual_heading, _ = sections[section_key] + heading_clean = self._clean_heading(section_key, actual_heading) + section_formatted = self._format_section_number(section_key, actual_heading) + return f"{ref.spec_file}: {section_formatted} {heading_clean}" + + def _try_suggest_ref_from_index(self, ref: SpecReference) -> None: + """If ref has function_name, try to set ref.suggested_ref from index.""" + if not ref.function_name: + return + correct_ref = self._section_finder.find_correct_reference_from_index( + ref.function_name + ) + if not correct_ref: + return + spec_file, section_num, heading = correct_ref + if not self._is_section_0_or_cross_reference(section_num, heading): + ref.suggested_ref = f"{spec_file}: {section_num} {heading}" + + def _ensure_spec_sections_loaded(self, ref: SpecReference, spec_path: Path) -> None: + """Load spec_anchors and spec_sections for ref.spec_file if not cached.""" + if ref.spec_file not in self.spec_anchors: + anchors, sections = self._parse_markdown_anchors(spec_path) + self.spec_anchors[ref.spec_file] = anchors + self.spec_sections[ref.spec_file] = sections + + def _build_invalid_format_issues(self, ref: SpecReference) -> List[ValidationIssue]: + if ref.spec_file and ref.section: + spec_path = self._get_spec_file_path(ref.spec_file) + if spec_path and spec_path.exists(): + self._ensure_spec_sections_loaded(ref, spec_path) + ref.suggested_ref = self._compute_suggested_ref_from_sections( + ref, self.spec_sections[ref.spec_file] + ) + message_parts = [] + if not ref.suggested_ref: + message_parts.extend([ + "Invalid format. Expected: 'file_name.md: section_number heading_text'", + f"Got: '{ref.raw_ref}'", + "Example: 'api_file_mgmt_addition.md: 2.1 AddFile Package Method'", + ]) + else: + message_parts.append(f"Invalid reference: '{ref.raw_ref}'") + if ref.spec_file: + spec_path = self._get_spec_file_path(ref.spec_file) + if not spec_path or not spec_path.exists(): + message_parts.append(f"Spec file not found: {ref.spec_file}") + if not message_parts: + return [] + return [ + ValidationIssue.create( + "invalid_spec_ref_format", + ref.file_path, + ref.line_num, + ref.line_num, + message=" ".join(message_parts), + severity="error", + suggestion=ref.suggested_ref, + raw_ref=ref.raw_ref, + spec_file=ref.spec_file, + ) + ] + + def _build_missing_spec_file_issue(self, ref: SpecReference) -> List[ValidationIssue]: + if ref.spec_file: + return [] + return [ + ValidationIssue.create( + "missing_spec_file", + ref.file_path, + ref.line_num, + ref.line_num, + message="No spec file specified in reference", + severity="error", + raw_ref=ref.raw_ref, + ) + ] + + def _resolve_spec_path_errors( + self, + ref: SpecReference, + ) -> Tuple[Optional[Path], List[ValidationIssue]]: + spec_path = self._get_spec_file_path(ref.spec_file) + if spec_path and spec_path.exists(): + return spec_path, [] + return None, [ + ValidationIssue.create( + "spec_file_not_found", + ref.file_path, + ref.line_num, + ref.line_num, + message=f"Spec file not found: {ref.spec_file}", + severity="error", + raw_ref=ref.raw_ref, + spec_file=ref.spec_file, + ) + ] + + def _build_section_not_found_issues( + self, + ref: SpecReference, + sections: Dict[str, Tuple[str, str]], + ) -> List[ValidationIssue]: + if ref.section in sections: + return [] + similar = [ + s for s in sections.keys() + if not self._is_section_0_or_cross_reference(s, sections[s][0]) + and (ref.section.startswith(s) or s.startswith(ref.section)) + ] + if similar: + actual_heading, _ = sections[similar[0]] + heading_clean = self._clean_heading(similar[0], actual_heading) + if not ref.suggested_ref: + ref.suggested_ref = f"{ref.spec_file}: {similar[0]} {heading_clean}" + message = ( + f"Section '{ref.section}' not found. " + f"Did you mean: '{similar[0]} {heading_clean}'?" + ) + else: + message = f"Section '{ref.section}' not found in {ref.spec_file}" + if sections: + available = [ + (num, self._clean_heading(num, heading)) + for num, (heading, _) in sorted(sections.items()) + if not self._is_section_0_or_cross_reference(num, heading) + ][:5] + if available: + message += ( + " Available sections: " + f"{', '.join(f'{n} {h}' for n, h in available)}..." + ) + return [ + ValidationIssue.create( + "section_not_found", + ref.file_path, + ref.line_num, + ref.line_num, + message=message, + severity="error", + suggestion=ref.suggested_ref, + raw_ref=ref.raw_ref, + spec_file=ref.spec_file, + section=ref.section, + ) + ] + + def _build_heading_mismatch_issues( + self, + ref: SpecReference, + actual_heading: str, + ) -> List[ValidationIssue]: + heading_clean = self._clean_heading(ref.section, actual_heading) + ref_heading_clean = self._clean_heading(ref.section, ref.heading) + normalized_ref = re.sub(r'\s+', ' ', ref_heading_clean.lower().strip()) + normalized_actual = re.sub(r'\s+', ' ', heading_clean.lower().strip()) + if ( + normalized_ref == normalized_actual + or normalized_ref in normalized_actual + or normalized_actual in normalized_ref + ): + return [] + if not ref.suggested_ref: + ref.suggested_ref = f"{ref.spec_file}: {ref.section} {heading_clean}" + return [ + ValidationIssue.create( + "heading_mismatch", + ref.file_path, + ref.line_num, + ref.line_num, + message=( + f"Heading mismatch for section {ref.section}. " + f"Expected: '{heading_clean}'. Got: '{ref.heading}'. " + f"Correct format: '{ref.spec_file}: {ref.section} {heading_clean}'" + ), + severity="error", + suggestion=ref.suggested_ref, + raw_ref=ref.raw_ref, + spec_file=ref.spec_file, + section=ref.section, + ) + ] + + def _validate_reference(self, ref: SpecReference) -> List[ValidationIssue]: + """Validate a single reference. Returns list of ValidationIssue objects (empty if valid).""" + self._try_suggest_ref_from_index(ref) + + if not ref.is_valid_format: + return self._build_invalid_format_issues(ref) + + missing_spec_file = self._build_missing_spec_file_issue(ref) + if missing_spec_file: + return missing_spec_file + + spec_path, spec_path_errors = self._resolve_spec_path_errors(ref) + if spec_path_errors: + return spec_path_errors + + self._ensure_spec_sections_loaded(ref, spec_path) + sections = self.spec_sections[ref.spec_file] + section_errors = self._build_section_not_found_issues(ref, sections) + if section_errors: + return section_errors + + actual_heading, _ = sections[ref.section] + return self._build_heading_mismatch_issues(ref, actual_heading) + + def find_spec_references(self, go_file: Path) -> List[SpecReference]: + """Find all specification references in a Go file.""" + references = [] + try: + lines = self.file_cache.get_lines(go_file) + for line_num, line in enumerate(lines, 1): + match = re.search(r'//\s*Specification:\s*(.+)', line) + if match: + ref_text = match.group(1).strip() + if ref_text: + ref = SpecReference(go_file, line_num, ref_text) + ref.function_name = self._section_finder.extract_function_or_type_name( + go_file, line_num + ) + references.append(ref) + except (IOError, OSError) as e: + error = ValidationIssue.create( + "file_read_error", go_file, 0, 0, + message=f"Could not read file: {e}", severity='error' + ) + self.issues.append(error) + except UnicodeDecodeError as e: + error = ValidationIssue.create( + "file_encoding_error", go_file, 0, 0, + message=f"Could not decode file (encoding issue): {e}", severity='error' + ) + self.issues.append(error) + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: + error = ValidationIssue.create( + "unexpected_error", go_file, 0, 0, + message=f"Unexpected error reading file: {e}", severity='error' + ) + self.issues.append(error) + return references + + def validate_all( + self, _check_index: bool = False, verbose: bool = False, output=None + ) -> Tuple[int, List[str]]: + """Validate all references in Go files. Returns (error_count, error_messages).""" + all_issues: List[ValidationIssue] = [] + self.verbose = verbose + self.issues = all_issues + self._output = output + + if output: + self._ensure_index_loaded(output) + + go_files = list(self.api_go_dir.rglob("*.go")) + if output: + output.add_verbose_line( + f"Scanning {len(go_files)} Go files for specification references..." + ) + + for go_file in go_files: + references = self.find_spec_references(go_file) + if not references: + continue + if verbose and output: + rel_path = go_file.relative_to(self.repo_root) + output.add_verbose_line( + f" Checking {rel_path} ({len(references)} reference(s))" + ) + for ref in references: + if verbose and output: + output.add_verbose_line(f" Validating: {ref.raw_ref}") + errors = self._validate_reference(ref) + if errors: + all_issues.extend(errors) + + error_messages = [ + issue.format_message(no_color=False) if isinstance(issue, ValidationIssue) + else str(issue) + for issue in all_issues + ] + return len(all_issues), error_messages diff --git a/scripts/lib/_validate_go_spec_signature_consistency_helpers.py b/scripts/lib/_validate_go_spec_signature_consistency_helpers.py new file mode 100644 index 00000000..1e497ecf --- /dev/null +++ b/scripts/lib/_validate_go_spec_signature_consistency_helpers.py @@ -0,0 +1,254 @@ +"""Helpers for Go signature consistency validation: extraction and counting.""" + +import re +from pathlib import Path +from typing import Dict, List, Optional, Tuple + +from lib._go_code_utils import ( + InterfaceParser, + Signature, + is_public_name, + parse_go_def_signature, + is_example_code, +) +from lib._validation_utils import OutputBuilder, ValidationIssue, parse_no_color_flag + +_RE_INTERFACE_PATTERN = re.compile(r'^\s*type\s+\w+(?:\s*\[[^\]]+\])?\s+interface\s*\{') +_RE_STRUCT_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*(\[[^\]]+\]))?\s+struct\s*\{') + + +def count_interface_methods(content: str, start_line: int, end_line: int) -> int: + """Count methods in an interface definition within a specific line range.""" + lines = content.split('\n') + method_count = 0 + interface_parser = InterfaceParser() + + for i in range(start_line - 1, min(end_line, len(lines))): + line = lines[i] + interface_name = interface_parser.check_interface_start(line) + if interface_name: + continue + if interface_parser.is_in_interface(): + still_in_interface = interface_parser.update_brace_depth(line) + if still_in_interface and interface_parser.brace_depth > 0: + sig = parse_go_def_signature(line, location="") + if sig and sig.kind in ('func', 'method'): + method_count += 1 + if not still_in_interface: + break + return method_count + + +def count_struct_fields(content: str, start_line: int, end_line: int) -> int: + """Count fields in a struct definition.""" + lines = content.split('\n') + field_count = 0 + in_struct = False + brace_depth = 0 + + for i in range(start_line - 1, min(end_line, len(lines))): + line = lines[i] + stripped = line.strip() + if re.match(r'^\s*type\s+\w+\s+struct\s*\{', line): + in_struct = True + brace_depth = stripped.count('{') - stripped.count('}') + continue + if in_struct: + brace_depth += stripped.count('{') - stripped.count('}') + if brace_depth > 0 and stripped and not stripped.startswith('//'): + if re.match(r'^\s*\w+\s+\w+', stripped): + if not re.match(r'^\s*func\s+', stripped): + field_count += 1 + if brace_depth <= 0: + break + return field_count + + +def _count_struct_fields_in_block( + block_lines: List[str], start_index: int, initial_brace_depth: int +) -> int: + """Count struct fields from block_lines starting at start_index.""" + field_count = 0 + temp_brace_depth = initial_brace_depth + for j in range(start_index + 1, len(block_lines)): + temp_line = block_lines[j] + temp_stripped = temp_line.strip() + if not temp_stripped or temp_stripped.startswith('//'): + continue + temp_brace_depth += temp_stripped.count('{') - temp_stripped.count('}') + if temp_brace_depth > 0 and temp_stripped: + if re.match(r'^\s*\w+\s+\w+', temp_stripped): + if not re.match(r'^\s*func\s+', temp_stripped): + field_count += 1 + if temp_brace_depth <= 0: + break + return field_count + + +def extract_signatures_from_block( + block_lines: List[str], + start_line: int, + relative_path: Path, + code_content: str, + lines: List[str], +) -> List[Signature]: + """Extract struct/func/method/type signatures from one Go code block.""" + result: List[Signature] = [] + for i, line in enumerate(block_lines): + line_num = start_line + i + stripped = line.strip() + if not stripped or stripped.startswith('//'): + continue + if _RE_INTERFACE_PATTERN.match(stripped): + continue + is_example = is_example_code( + code_content, start_line, lines=lines, check_single_line=i + ) + struct_match = _RE_STRUCT_PATTERN.match(stripped) + if struct_match: + if is_example: + continue + name = struct_match.group(1) + generic_params = struct_match.group(2) + is_public = is_public_name(name) if name else False + brace_depth = stripped.count('{') - stripped.count('}') + has_full_body = brace_depth > 0 + field_count = ( + _count_struct_fields_in_block(block_lines, i, brace_depth) + if has_full_body + else 0 + ) + result.append(Signature( + name=name, + kind='type', + location=f"{relative_path}:{line_num}", + is_public=is_public, + has_body=has_full_body, + field_count=field_count, + generic_params=generic_params + )) + continue + sig = parse_go_def_signature(line, location=f"{relative_path}:{line_num}") + if not sig: + continue + if sig.kind in ('func', 'method'): + result.append(Signature( + name=sig.name, + kind=sig.kind, + receiver=sig.receiver, + params=sig.params, + returns=sig.returns, + location=f"{relative_path}:{line_num}", + is_public=sig.is_public, + has_body=True + )) + else: + if is_example: + continue + if sig.kind != 'interface': + result.append(Signature( + name=sig.name, + kind=sig.kind, + location=f"{relative_path}:{line_num}", + is_public=sig.is_public, + has_body=False, + generic_params=sig.generic_params + )) + return result + + +def parse_cli_args( + argv: List[str], +) -> Tuple[bool, bool, bool, Optional[str], Optional[str]]: + """Parse CLI flags for the signature consistency validator.""" + verbose = '--verbose' in argv or '-v' in argv + no_color = parse_no_color_flag(argv) + no_fail = '--no-fail' in argv + output_file = None + target_paths_str = None + for i, arg in enumerate(argv): + if arg in ('--output', '-o') and i + 1 < len(argv): + output_file = argv[i + 1] + elif arg in ('--path', '-p') and i + 1 < len(argv): + target_paths_str = argv[i + 1] + return verbose, no_color, no_fail, output_file, target_paths_str + + +def build_output( + verbose: bool, + no_color: bool, + output_file: Optional[str], +) -> OutputBuilder: + """Create an OutputBuilder for the signature consistency validator.""" + return OutputBuilder( + "Go Signature Consistency", + "Validates signature consistency within tech specs", + no_color=no_color, + verbose=verbose, + output_file=output_file + ) + + +def split_issues( + issues: List[ValidationIssue], +) -> Tuple[List[ValidationIssue], List[ValidationIssue]]: + """Split issues into error and warning lists.""" + errors: List[ValidationIssue] = [] + warnings: List[ValidationIssue] = [] + for issue in issues: + if issue.matches(severity='error'): + errors.append(issue) + if issue.matches(severity='warning'): + warnings.append(issue) + return errors, warnings + + +def emit_issues( + output: OutputBuilder, + errors: List[ValidationIssue], + warnings: List[ValidationIssue], + no_color: bool, +) -> None: + """Emit issue lines to the output builder.""" + for error in errors: + output.add_error_line(error.format_message(no_color=no_color)) + for warning in warnings: + output.add_warning_line(warning.format_message(no_color=no_color)) + + +def emit_summary( + output: OutputBuilder, + all_signatures: List[Signature], + signatures_by_key: Dict[str, List[Signature]], + errors: List[ValidationIssue], + warnings: List[ValidationIssue], +) -> None: + """Emit summary section for signature validation.""" + summary_items = [ + ("Signatures checked:", len(all_signatures)), + ("Unique definitions:", len(signatures_by_key)), + ] + if errors: + summary_items.append(("Errors found:", len(errors))) + if warnings: + summary_items.append(("Warnings found:", len(warnings))) + output.add_summary_header() + output.add_summary_section(summary_items) + + +def emit_final_message( + output: OutputBuilder, + errors: List[ValidationIssue], + warnings: List[ValidationIssue], +) -> None: + """Emit the final success, warnings-only, or failure message.""" + if errors: + output.add_failure_message("Validation failed. Please fix the errors above.") + return + if warnings: + output.add_warnings_only_message( + message="All signatures are consistent! (Some warnings were found - see above)", + verbose_hint="Run with --verbose to see the full warning details.", + ) + else: + output.add_success_message('All signatures are consistent!') diff --git a/scripts/lib/_validate_heading_numbering_helpers.py b/scripts/lib/_validate_heading_numbering_helpers.py new file mode 100644 index 00000000..19fe9a40 --- /dev/null +++ b/scripts/lib/_validate_heading_numbering_helpers.py @@ -0,0 +1,146 @@ +""" +Structure validation and check helpers for heading numbering. +""" + +import re +from pathlib import Path + +from lib._validation_utils import ValidationIssue +from lib._validate_heading_numbering_models import MAX_ORGANIZATIONAL_PROSE_LINES + + +def _is_first_h2_numbered(first_h2): + """Return True if the first H2 heading has valid numbering.""" + return ( + first_h2.numbers and + len(first_h2.numbers) > 0 and + first_h2.original_number != "MISSING" + ) + + +def _record_unnumbered_issues( + filepath, unnumbered_headings, issues, first_error_line +): + """Append heading_missing_numbering issues for each unnumbered heading.""" + for line_num, level, _heading_text, full_line, heading_info in unnumbered_headings: + msg = ( + f"H{level} heading is missing numbering. " + "This document uses numbered headings, so all headings must be numbered." + ) + error = ValidationIssue.create( + "heading_missing_numbering", + Path(filepath), + line_num, + line_num, + message=msg, + severity='error', + heading=full_line, + heading_info=heading_info + ) + issues.append(error) + heading_info.issue = error + if first_error_line[filepath] is None: + first_error_line[filepath] = line_num + + +def _check_first_h2_value(filepath, first_h2, issues): + """If first H2 number is not 0 or 1, append issue and return True. Else return False.""" + if first_h2.numbers[0] in [0, 1]: + return False + error = ValidationIssue.create( + "heading_first_h2_numbering", + Path(filepath), + first_h2.line_num, + first_h2.line_num, + message=( + f"First H2 heading must be numbered '0' or '1', " + f"got '{first_h2.numbers[0]}'. " + "Please run a markdown linter to fix basic heading order, " + "then re-run this script." + ), + severity='error', + heading=first_h2.full_line, + heading_info=first_h2 + ) + issues.append(error) + first_h2.issue = error + return True + + +def _check_heading_parents(filepath, headings, issues, log_fn=None): + """Append heading_no_parent issues for headings with no parent; log H2 parent warning.""" + for heading in headings: + if heading.level == 2 and heading.parent is not None: + if log_fn: + log_fn(f" Warning: H2 heading at line {heading.line_num} has a parent") + elif heading.level > 2 and heading.parent is None: + error = ValidationIssue.create( + "heading_no_parent", + Path(filepath), + heading.line_num, + heading.line_num, + message=( + f"H{heading.level} heading has no parent. " + "Please run a markdown linter to fix basic heading order, " + "then re-run this script." + ), + severity='error', + heading=heading.full_line, + heading_info=heading + ) + issues.append(error) + heading.issue = error + + +def validate_heading_structure( + filepath, headings, unnumbered_headings, *, issues, first_error_line, log_fn=None +): + """ + Validate heading structure after parsing. Mutates headings (sets .issue) + and first_error_line. Appends to issues. + + Returns: + List of headings (may be modified with issues). + """ + if not headings: + return [] + h2_headings = [h for h in headings if h.level == 2] + if not h2_headings: + return headings + first_h2 = min(h2_headings, key=lambda h: h.line_num) + if _is_first_h2_numbered(first_h2) and unnumbered_headings: + _record_unnumbered_issues(filepath, unnumbered_headings, issues, first_error_line) + if not _is_first_h2_numbered(first_h2): + return headings + if _check_first_h2_value(filepath, first_h2, issues): + return headings + _check_heading_parents(filepath, headings, issues, log_fn) + return headings + + +def is_go_code_related_heading(heading_text): + """ + Return True if the heading appears to reference a Go code element. + """ + if not heading_text: + return False + camel_case_pattern = r'\b[a-z][a-zA-Z]*[A-Z][a-zA-Z]*\b' + if re.search(camel_case_pattern, heading_text): + return True + method_pattern = r'\b[a-z][a-zA-Z]*\.[A-Z][a-zA-Z]*\b' + if re.search(method_pattern, heading_text): + return True + go_kind_words = ['Struct', 'Function', 'Method', 'Interface', 'Type'] + for kind_word in go_kind_words: + pattern = rf'\b[a-z][a-zA-Z]*\s+{kind_word}\b' + if re.search(pattern, heading_text): + return True + method_kind_pattern = rf'\b[a-z][a-zA-Z]*\.[A-Z][a-zA-Z]*\s+{kind_word}\b' + if re.search(method_kind_pattern, heading_text): + return True + return False + + +def get_max_organizational_prose_lines(): + """Return constant for organizational heading check (for callers that need it).""" + return MAX_ORGANIZATIONAL_PROSE_LINES diff --git a/scripts/lib/_validate_heading_numbering_models.py b/scripts/lib/_validate_heading_numbering_models.py new file mode 100644 index 00000000..b044cd08 --- /dev/null +++ b/scripts/lib/_validate_heading_numbering_models.py @@ -0,0 +1,34 @@ +""" +Models and constants for heading numbering validation. +""" + +import re + +RE_HEADING_PATTERN = re.compile(r'^(#{1,})\s+(.+)$') +RE_NUMBERED_HEADING_PATTERN = re.compile(r'^([0-9]+(?:\.[0-9]+)*)\.?\s+(.+)$') + +MAX_HEADING_NUMBER_SEGMENT = 20 +MAX_ORGANIZATIONAL_PROSE_LINES = 5 + + +class HeadingInfo: + """Represents a heading with its metadata for sorting.""" + + def __init__(self, file, line_num, level, numbers, *, heading_text, full_line, + parent=None, issue=None): + self.file = file + self.line_num = line_num + self.level = level + self.numbers = numbers + self.heading_text = heading_text + self.full_line = full_line + self.parent = parent + self.issue = issue + self.original_number = '.'.join(map(str, numbers)) if numbers else '' + self.corrected_number = None + self.has_period = False + self.corrected_capitalization = None + + def sort_key(self): + """Return a sort key for proper numeric ordering.""" + return (tuple(self.numbers), self.level) diff --git a/scripts/lib/_validate_heading_numbering_report.py b/scripts/lib/_validate_heading_numbering_report.py new file mode 100644 index 00000000..b86e395a --- /dev/null +++ b/scripts/lib/_validate_heading_numbering_report.py @@ -0,0 +1,484 @@ +""" +Report/output helpers for heading numbering validation. + +Emit error blocks and numbering-display section used by validate_heading_numbering. +""" + +import re +from collections import defaultdict + +from lib._validation_utils import ValidationIssue, format_issue_message + + +def emit_org_errors_block( + output_builder, errors_by_type, errors_by_line, *, + rel_path_fn, build_corrected_full_line_fn, no_color +): + """Emit organizational heading errors section.""" + org_errors = errors_by_type.get("organizational_heading", []) + output_builder.add_line( + f"Organizational Heading Errors ({len(org_errors)}):", + section="error" + ) + output_builder.add_blank_line("error") + org_errors_by_line = { + k: v for k, v in errors_by_line.items() + if any( + isinstance(e, ValidationIssue) and + e.issue_type == "organizational_heading" + for e in v + ) + } + sorted_org_lines = sorted( + org_errors_by_line.keys(), key=lambda k: (k[0], k[1]) + ) + for file, line_num in sorted_org_lines: + line_errors = [ + e for e in org_errors_by_line[(file, line_num)] + if isinstance(e, ValidationIssue) and + e.issue_type == "organizational_heading" + ] + if not line_errors: + continue + rel_file = rel_path_fn(file) + messages = [ + e.message if isinstance(e, ValidationIssue) else e.get('message', '') + for e in line_errors + ] + combined_message = "; ".join(messages) + suggestion = None + for error in line_errors: + heading_info = error.extra_fields.get('heading_info') + if heading_info: + suggestion = build_corrected_full_line_fn(heading_info) + break + error_msg = format_issue_message( + "error", + "Organizational heading", + rel_file, + line_num=line_num, + message=combined_message, + suggestion=suggestion, + no_color=no_color + ) + output_builder.add_error_line(error_msg) + output_builder.add_blank_line("error") + + +def emit_formatting_errors_block( + output_builder, errors_by_type, errors_by_line, org_errors, *, + rel_path_fn, build_corrected_full_line_fn, no_color +): + """Emit heading formatting errors section.""" + formatting_errors = errors_by_type.get("heading_formatting", []) + if org_errors: + output_builder.add_separator(section="error") + output_builder.add_blank_line("error") + output_builder.add_line( + f"Heading Formatting Errors ({len(formatting_errors)}):", + section="error" + ) + output_builder.add_blank_line("error") + formatting_errors_by_line = { + k: v for k, v in errors_by_line.items() + if any( + isinstance(e, ValidationIssue) and + e.issue_type == "heading_formatting" + for e in v + ) + } + sorted_formatting_lines = sorted( + formatting_errors_by_line.keys(), key=lambda k: (k[0], k[1]) + ) + for file, line_num in sorted_formatting_lines: + line_errors = [ + e for e in formatting_errors_by_line[(file, line_num)] + if isinstance(e, ValidationIssue) and + e.issue_type == "heading_formatting" + ] + if not line_errors: + continue + rel_file = rel_path_fn(file) + messages = [ + e.message if isinstance(e, ValidationIssue) else e.get('message', '') + for e in line_errors + ] + combined_message = "; ".join(messages) + suggestion = None + for error in line_errors: + heading_info = error.extra_fields.get('heading_info') + if heading_info: + suggestion = build_corrected_full_line_fn(heading_info) + break + error_msg = format_issue_message( + "error", + "Heading formatting", + rel_file, + line_num=line_num, + message=combined_message, + suggestion=suggestion, + no_color=no_color + ) + output_builder.add_error_line(error_msg) + output_builder.add_blank_line("error") + + +def emit_numbering_errors_block( + output_builder, errors_by_line, *, + org_errors, formatting_errors, numbering_errors, + rel_path_fn, build_corrected_full_line_fn, no_color +): + """Emit heading numbering errors section.""" + if org_errors or formatting_errors: + output_builder.add_separator(section="error") + output_builder.add_blank_line("error") + output_builder.add_line( + f"Heading Numbering Errors ({len(numbering_errors)}):", + section="error" + ) + output_builder.add_blank_line("error") + numbering_errors_by_line = { + k: v for k, v in errors_by_line.items() + if any( + isinstance(e, ValidationIssue) and + e.issue_type not in ( + "organizational_heading", "heading_formatting" + ) + for e in v + ) + } + sorted_numbering_lines = sorted( + numbering_errors_by_line.keys(), key=lambda k: (k[0], k[1]) + ) + for file, line_num in sorted_numbering_lines: + line_errors = numbering_errors_by_line[(file, line_num)] + if not line_errors: + continue + rel_file = rel_path_fn(file) + messages = [ + e.message if isinstance(e, ValidationIssue) else e.get('message', '') + for e in line_errors + ] + combined_message = "; ".join(messages) + suggestion = None + for error in line_errors: + heading_info = error.extra_fields.get('heading_info') + if heading_info: + suggestion = build_corrected_full_line_fn(heading_info) + break + error_msg = format_issue_message( + "error", + "Heading numbering", + rel_file, + line_num=line_num, + message=combined_message, + suggestion=suggestion, + no_color=no_color + ) + output_builder.add_error_line(error_msg) + output_builder.add_blank_line("error") + + +def filter_headings_with_numbering_errors(errored_headings): + """Return list of headings that have numbering errors.""" + result = [] + for heading in errored_headings: + if heading.original_number == "MISSING" and heading.corrected_number: + result.append(heading) + elif heading.original_number and heading.corrected_number: + current = heading.original_number.rstrip('.') + correct = heading.corrected_number.rstrip('.') + if current != correct: + result.append(heading) + return result + + +def emit_numbering_display_block( + output_builder, _filepath, first_error_line, rel_file, + headings_with_numbering_errors, *, + _build_corrected_full_line_fn=None +): + """ + Emit the 'Sorted headings from first error' block (format for apply_heading_corrections). + """ + output_builder.add_separator(section="error") + output_builder.add_line( + f"Sorted headings from first error (line {first_error_line}) " + f"in {rel_file}:", + section="error" + ) + output_builder.add_separator(section="error") + output_builder.add_blank_line("error") + output_builder.add_line( + "The following headings should be in this order " + "(sorted by numeric values):", + section="error" + ) + output_builder.add_blank_line("error") + output_builder.add_line( + "Format: Line X: [CURRENT] -> [CORRECT] Title", + section="error" + ) + output_builder.add_blank_line("error") + + sorted_headings = sorted( + headings_with_numbering_errors, + key=lambda h: (h.line_num, h.sort_key()) + ) + max_line_num = ( + max(h.line_num for h in sorted_headings) + if sorted_headings else 0 + ) + line_num_width = len(str(max_line_num)) + + h2_headings_in_output = [h for h in sorted_headings if h.level == 2] + display_period = False + if h2_headings_in_output: + first_h2 = min(h2_headings_in_output, key=lambda h: h.line_num) + display_period = first_h2.has_period + + for heading in sorted_headings: + current_number_str = heading.original_number + if heading.corrected_number is None: + correct_number_str = current_number_str + else: + correct_number_str = heading.corrected_number + + if current_number_str == "MISSING": + current_display = "MISSING" + elif heading.level == 2 and display_period: + current_display = f"{current_number_str}." + else: + current_display = current_number_str + + current_for_comparison = current_number_str.rstrip('.') + correct_for_comparison = correct_number_str.rstrip('.') + needs_change = current_for_comparison != correct_for_comparison + + is_duplicate_error = ( + heading.issue and + isinstance(heading.issue, ValidationIssue) and + heading.issue.matches(issue_type="heading_duplicate") + ) + + if heading.level == 2 and display_period: + correct_display = f"{correct_number_str}." + else: + correct_display = correct_number_str + + heading_text_display = heading.heading_text + if heading.corrected_capitalization: + heading_text_display = heading.corrected_capitalization + + if is_duplicate_error and not needs_change: + output_builder.add_error_line( + f"Line {heading.line_num:{line_num_width}d}: " + f"{'#' * heading.level} [{current_display}] (DUPLICATE) " + f"{heading_text_display}" + ) + else: + output_builder.add_error_line( + f"Line {heading.line_num:{line_num_width}d}: " + f"{'#' * heading.level} [{current_display}] -> " + f"[{correct_display}] {heading_text_display}" + ) + + output_builder.add_blank_line("error") + + +def _partition_errors_warnings(validator_issues): + """Return (errors, warnings) from validator_issues.""" + errors = [i for i in validator_issues if i.matches(severity='error')] + warnings = [i for i in validator_issues if i.matches(severity='warning')] + return (errors, warnings) + + +def _emit_errors_section( + output_builder, + errors, + headings_from_first_error, + first_error_line, + *, + rel_path_fn, + build_corrected_full_line_fn, + no_color, +): + """Emit errors header and all error blocks (org, formatting, numbering, display).""" + output_builder.add_errors_header() + output_builder.add_line(f"Found {len(errors)} error(s):", section="error") + output_builder.add_blank_line("error") + errors_by_type = defaultdict(list) + for error in errors: + errors_by_type[error.issue_type].append(error) + errors_by_line = defaultdict(list) + for error in errors: + errors_by_line[(error.file, error.start_line)].append(error) + org_errors = errors_by_type.get("organizational_heading", []) + if org_errors: + emit_org_errors_block( + output_builder, errors_by_type, errors_by_line, + rel_path_fn=rel_path_fn, + build_corrected_full_line_fn=build_corrected_full_line_fn, + no_color=no_color, + ) + formatting_errors = errors_by_type.get("heading_formatting", []) + if formatting_errors: + emit_formatting_errors_block( + output_builder, errors_by_type, errors_by_line, org_errors, + rel_path_fn=rel_path_fn, + build_corrected_full_line_fn=build_corrected_full_line_fn, + no_color=no_color, + ) + numbering_errors = [ + e for e in errors + if isinstance(e, ValidationIssue) and + e.issue_type not in ("organizational_heading", "heading_formatting") + ] + if numbering_errors: + emit_numbering_errors_block( + output_builder, errors_by_line, + org_errors=org_errors, + formatting_errors=formatting_errors, + numbering_errors=numbering_errors, + rel_path_fn=rel_path_fn, + build_corrected_full_line_fn=build_corrected_full_line_fn, + no_color=no_color, + ) + if numbering_errors: + for filepath in sorted(headings_from_first_error.keys()): + errored_headings = headings_from_first_error[filepath] + if not errored_headings: + continue + headings_with_numbering_errors = filter_headings_with_numbering_errors( + errored_headings + ) + if not headings_with_numbering_errors: + continue + first_error_line_val = first_error_line[filepath] + rel_file = rel_path_fn(filepath) + emit_numbering_display_block( + output_builder, filepath, first_error_line_val, rel_file, + headings_with_numbering_errors, + _build_corrected_full_line_fn=build_corrected_full_line_fn, + ) + + +def _emit_warnings_section( + output_builder, warnings, *, rel_path_fn, build_corrected_full_line_fn, no_color +): + """Emit warnings header and warning lines.""" + output_builder.add_warnings_header() + output_builder.add_line(f"Found {len(warnings)} warning(s):", section="warning") + output_builder.add_blank_line("warning") + warnings_by_line = defaultdict(list) + for warning in warnings: + if isinstance(warning, ValidationIssue): + key = (warning.file, warning.start_line) + else: + key = (warning.get('file', ''), warning.get('line_num', 0)) + warnings_by_line[key].append(warning) + + def _category_for_issue_type(issue_type): + if issue_type == "heading_capitalization": + return "Heading capitalization" + if issue_type == "organizational_heading": + return "Organizational heading" + return "Heading numbering" + + for file, line_num in sorted(warnings_by_line.keys(), key=lambda k: (k[0], k[1])): + line_warnings = warnings_by_line[(file, line_num)] + rel_file = rel_path_fn(file) + messages = [] + first_issue_type = None + for warning in line_warnings: + if isinstance(warning, ValidationIssue): + if first_issue_type is None: + first_issue_type = warning.issue_type + msg = warning.message + else: + msg = warning.get('message', '') + if "expected" in msg.lower(): + msg = re.sub(r",\s*expected\s+['\"][^'\"]+['\"]", "", msg) + messages.append(msg) + combined_message = "; ".join(messages) + category = _category_for_issue_type(first_issue_type or "") + suggestion = None + for warning in line_warnings: + heading_info = ( + warning.extra_fields.get('heading_info') + if isinstance(warning, ValidationIssue) + else getattr(warning, 'heading_info', None) + ) + if heading_info: + is_cap = ( + isinstance(warning, ValidationIssue) + and warning.issue_type == "heading_capitalization" + ) + if is_cap: + suggestion = ( + f"{heading_info.full_line} => " + f"{build_corrected_full_line_fn(heading_info)}" + ) + else: + suggestion = build_corrected_full_line_fn(heading_info) + break + warning_msg = format_issue_message( + "warning", category, rel_file, + line_num=line_num, message=combined_message, + suggestion=suggestion, no_color=no_color, + ) + output_builder.add_warning_line(warning_msg) + + +def _emit_final_message(output_builder, errors, warnings, all_headings): + """Emit success/final_message/failure based on errors and warnings.""" + if not errors and not warnings: + files_checked = len(all_headings) + total_headings = sum(len(h) for h in all_headings.values()) + output_builder.add_summary_header() + output_builder.add_summary_section([ + ("Files checked:", files_checked), + ("Headings checked:", total_headings), + ]) + output_builder.add_success_message("All heading numbering is valid!") + elif not errors: + output_builder.add_warnings_only_message( + message="No heading numbering errors found (only warnings).", + ) + else: + output_builder.add_failure_message( + "Validation failed. Please fix the errors above." + ) + + +def print_summary( + validator_issues, + all_headings, + headings_from_first_error, + first_error_line, + *, + rel_path_fn, + build_corrected_full_line_fn, + no_color, + output_builder +): + """ + Print validation summary. Uses validator's rel_path and build_corrected_full_line + via the passed functions; issues/headings/state passed as data. + """ + errors, warnings = _partition_errors_warnings(validator_issues) + if errors: + _emit_errors_section( + output_builder, errors, headings_from_first_error, first_error_line, + rel_path_fn=rel_path_fn, + build_corrected_full_line_fn=build_corrected_full_line_fn, + no_color=no_color, + ) + if warnings: + _emit_warnings_section( + output_builder, warnings, + rel_path_fn=rel_path_fn, + build_corrected_full_line_fn=build_corrected_full_line_fn, + no_color=no_color, + ) + _emit_final_message(output_builder, errors, warnings, all_headings) diff --git a/scripts/lib/_validate_heading_numbering_title_case.py b/scripts/lib/_validate_heading_numbering_title_case.py new file mode 100644 index 00000000..be2bbe4b --- /dev/null +++ b/scripts/lib/_validate_heading_numbering_title_case.py @@ -0,0 +1,241 @@ +""" +Title-case and capitalization helpers for heading numbering validation. + +Pure functions used by validate_heading_numbering to apply Title Case rules +to heading text (preserving backticks, filenames, code-in-parens). +Case inside backticks is not checked and is preserved as-is. +""" + +import re + +_RE_SPLIT_WORDS = re.compile(r'\S+|\s+') +_RE_WHITESPACE_ONLY = re.compile(r'^\s+$') +_RE_FILENAME_PATTERN = re.compile(r'^[\w\-]+\.(\w+)$') +_RE_NON_WORD_CHARS = re.compile(r'[^\w]') +_RE_FIRST_LETTER = re.compile(r'[a-zA-Z]') + +_COMMON_EXTENSIONS = [ + 'go', 'md', 'txt', 'json', 'yaml', 'yml', 'xml', 'html', 'css', 'js', + 'ts', 'py', 'sh', 'bat', 'ps1', 'java', 'c', 'cpp', 'h', 'hpp', + 'rs', 'rb', 'php', 'sql', 'csv', 'tsv', 'log', 'conf', 'config', + 'ini', 'toml', 'lock', 'sum', 'mod', 'gitignore', 'editorconfig' +] + +_LOWERCASE_WORDS = { + 'a', 'an', 'the', + 'and', 'but', 'or', 'nor', 'so', 'yet', + 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by', 'from', 'up', 'about', + 'into', 'onto', 'upon', 'over', 'under', 'above', 'below', 'across', + 'via', 'vs' +} + +_CAPITALIZE_PREPOSITIONS = { + 'through', 'between', 'among', 'during', 'before', 'after', 'within', + 'without', 'against', 'along', 'around', 'behind', 'beside', 'beyond', + 'inside', 'outside', 'throughout', 'toward', 'towards', 'underneath' +} + +_PROGRAMMING_KEYWORDS = { + 'return', 'if', 'else', 'for', 'while', 'do', 'switch', 'case', + 'break', 'continue', 'goto', 'throw', 'try', 'catch', 'finally', + 'new', 'delete', 'this', 'super', 'static', 'const', 'let', 'var', + 'function', 'class', 'interface', 'enum', 'type', 'import', 'export', + 'async', 'await', 'yield', 'def', 'lambda', 'pass', 'raise', 'except' +} + +_PHRASAL_PARTICLES = {'up', 'down', 'out', 'off', 'in', 'on', 'over', 'away', 'back'} + +_PHRASAL_VERB_BASES = { + 'clean', 'set', 'look', 'pick', 'give', 'make', 'break', 'build', 'call', + 'check', 'close', 'come', 'cut', 'fill', 'get', 'go', 'grow', 'hang', + 'hold', 'keep', 'line', 'live', 'move', 'open', 'pull', 'put', 'show', + 'sign', 'stand', 'start', 'take', 'turn', 'wake', 'warm', 'wrap', 'bring', + 'carry', 'catch', 'do', 'draw', 'drop', 'end', 'fall', 'find', + 'fix', 'follow', 'hand', 'head', 'help', 'join', 'jump', 'knock', 'lay', + 'leave', 'let', 'lie', 'lock', 'log', 'mix', 'pass', 'pay', 'point', + 'pop', 'push', 'read', 'run', 'send', 'shut', 'sit', 'slow', 'sort', + 'speed', 'split', 'spread', 'step', 'stick', 'stop', 'switch', 'talk', + 'tear', 'think', 'throw', 'tie', 'try', 'use', 'walk', 'wash', + 'watch', 'wear', 'wind', 'work', 'write' +} + + +def find_backtick_ranges(text): + """Find all backtick-enclosed sections and their positions.""" + backtick_ranges = [] + i = 0 + while i < len(text): + if text[i] == '`': + start = i + i += 1 + while i < len(text) and text[i] != '`': + i += 1 + if i < len(text): + end = i + 1 + backtick_ranges.append((start, end)) + i = end + else: + break + else: + i += 1 + return backtick_ranges + + +def is_in_backticks(pos, backtick_ranges): + """Check if a character position is inside backticks.""" + for start, end in backtick_ranges: + if start <= pos < end: + return True + return False + + +def should_preserve_part(part, part_start, part_end, backtick_ranges): + """Check if a part should be preserved as-is (backticks, underscores, filenames).""" + part_in_backticks = any( + is_in_backticks(pos, backtick_ranges) + for pos in range(part_start, part_end) + ) + if part_in_backticks: + return True + if '_' in part: + return True + filename_match = _RE_FILENAME_PATTERN.match(part) + if filename_match: + extension = filename_match.group(1).lower() + if extension in _COMMON_EXTENSIONS: + return True + return False + + +def is_in_code_parentheses(_part, parts, part_index, text): + """Check if a word is inside parentheses with code-like content (backticks).""" + text_up_to_here = ''.join(parts[:part_index + 1]) + if '(' not in text_up_to_here: + return False + last_open_paren = text_up_to_here.rfind('(') + if last_open_paren < 0: + return False + text_before = ''.join(parts[:part_index + 1]) + open_count = text_before.count('(') - text_before.count(')') + if open_count <= 0: + return False + text_from_paren = text[last_open_paren:] + close_paren_pos = text_from_paren.find(')', 1) + if close_paren_pos > 0: + parens_content = text_from_paren[1:close_paren_pos] + else: + parens_content = text_from_paren[1:] + return '`' in parens_content + + +def should_capitalize_word( + word_clean, is_first, is_last, *, is_in_code_parens, previous_word_clean=None +): + """ + Determine if a word should be capitalized based on title case rules. + """ + if is_in_code_parens and word_clean in _PROGRAMMING_KEYWORDS: + return False + if is_first or is_last: + return True + if word_clean in _CAPITALIZE_PREPOSITIONS: + return True + if word_clean in _PHRASAL_PARTICLES and previous_word_clean: + if previous_word_clean in _PHRASAL_VERB_BASES: + return True + if word_clean not in _LOWERCASE_WORDS: + return True + return False + + +def capitalize_word(part, should_cap): + """Apply capitalization to a word part.""" + if not part: + return part + match = _RE_FIRST_LETTER.search(part) + if not match: + return part + first_letter_idx = match.start() + if should_cap: + return ( + part[:first_letter_idx] + + part[first_letter_idx].upper() + + part[first_letter_idx + 1:] + ) + has_internal_capitals = any( + c.isupper() for c in part[first_letter_idx + 1:] + if c.isalpha() + ) + if has_internal_capitals: + if part[first_letter_idx].isupper(): + return ( + part[:first_letter_idx] + + part[first_letter_idx].lower() + + part[first_letter_idx + 1:] + ) + return part + return ( + part[:first_letter_idx] + + part[first_letter_idx].lower() + + part[first_letter_idx + 1:].lower() + ) + + +def to_title_case(text): + """ + Convert text to Title Case following standard rules. + Preserves backticks, filenames, code-in-parens. + """ + if not text: + return text + backtick_ranges = find_backtick_ranges(text) + parts = _RE_SPLIT_WORDS.findall(text) + result_parts = [] + word_indices = [] + char_pos = 0 + part_positions = [] + for i, part in enumerate(parts): + if not _RE_WHITESPACE_ONLY.match(part): + word_indices.append(i) + part_positions.append((char_pos, char_pos + len(part))) + char_pos += len(part) + + if not word_indices: + return text + + for i, part in enumerate(parts): + if _RE_WHITESPACE_ONLY.match(part): + result_parts.append(part) + continue + part_start, part_end = part_positions[i] + if should_preserve_part(part, part_start, part_end, backtick_ranges): + result_parts.append(part) + continue + word_clean = _RE_NON_WORD_CHARS.sub('', part.lower()) + if not word_clean: + result_parts.append(part) + continue + # Preserve single-letter words that are uppercase in the original + # (e.g. "Option A", "Plan B") so we do not suggest "Option a". + if len(word_clean) == 1: + first_letter_match = _RE_FIRST_LETTER.search(part) + if first_letter_match and part[first_letter_match.start()].isupper(): + result_parts.append(part) + continue + is_first = i == word_indices[0] + is_last = i == word_indices[-1] + previous_word_clean = None + if not is_first: + current_word_index = word_indices.index(i) + if current_word_index > 0: + prev_word_i = word_indices[current_word_index - 1] + prev_part = parts[prev_word_i] + previous_word_clean = _RE_NON_WORD_CHARS.sub('', prev_part.lower()) + is_in_code_parens = is_in_code_parentheses(part, parts, i, text) + should_cap = should_capitalize_word( + word_clean, is_first, is_last, + is_in_code_parens=is_in_code_parens, + previous_word_clean=previous_word_clean + ) + result_parts.append(capitalize_word(part, should_cap)) + return ''.join(result_parts) diff --git a/scripts/lib/_validate_links_helpers.py b/scripts/lib/_validate_links_helpers.py new file mode 100644 index 00000000..49053002 --- /dev/null +++ b/scripts/lib/_validate_links_helpers.py @@ -0,0 +1,234 @@ +"""Helpers for validate_links.py: heading extraction, text normalization, anchor suggestion.""" + +import re +from pathlib import Path +from typing import List + +from lib._validation_utils import ( + extract_headings_with_anchors, + validate_anchor, +) + +# Compiled regex patterns for performance (module level) +_RE_MARKDOWN_FORMAT = re.compile(r'[*_`]') +_RE_SPECIAL_CHARS = re.compile(r'[^a-zA-Z0-9 .-]') +_RE_SPLIT_WORDS = re.compile(r'[\s.\-]+') +_RE_CAMEL_CASE = re.compile(r'([a-z])([A-Z])') +_RE_NUMBERING_PREFIX = re.compile(r'^([0-9]+(?:\.[0-9]+)*)\.?\s+(.+)$') + + +def extract_headings(file_path, file_cache=None): + """ + Extract all headings from a markdown file and generate anchors. + + Args: + file_path: Path to the file + file_cache: Optional FileContentCache instance to use for reading files + + Returns: + dict: Mapping of anchor -> (heading_text, heading_level, line_number) + """ + return extract_headings_with_anchors(Path(file_path), file_cache=file_cache) + + +def normalize_text_for_matching(text: str) -> str: + """ + Normalize text for matching by removing markdown formatting, + converting to lowercase, and removing common words. + + Args: + text: Text to normalize + + Returns: + Normalized text string + """ + text = _RE_MARKDOWN_FORMAT.sub('', text) + text = text.lower() + text = _RE_SPECIAL_CHARS.sub('', text) + return text.strip() + + +def extract_words(text: str) -> List[str]: + """ + Extract words from text, handling various separators. + + Handles: + - Spaces: "Add File" -> ["add", "file"] + - Dots: "Package.AddFile" -> ["package", "add", "file"] + - Hyphens: "add-file" -> ["add", "file"] + - CamelCase: "AddFile" -> ["add", "file"] + - Mixed: "Package.AddFile" -> ["package", "add", "file"] + + Args: + text: Text to extract words from + + Returns: + List of normalized words (all lowercase) + """ + text = _RE_MARKDOWN_FORMAT.sub('', text) + words = _RE_SPLIT_WORDS.split(text) + all_words = [] + for word in words: + if not word: + continue + camel_split = _RE_CAMEL_CASE.sub(r'\1 \2', word) + camel_words = camel_split.split() + all_words.extend([w.lower() for w in camel_words if w]) + stop_words = {'the', 'a', 'an', 'and', 'or', 'of', 'in', 'on', 'at', 'to', 'for', 'with', 'by'} + return [w for w in all_words if w and w not in stop_words] + + +def strip_numbering_prefix(text: str) -> str: + """ + Strip numbering prefix from heading text (e.g., "1.2.3 Add File" -> "Add File"). + + Args: + text: Heading text that may contain numbering + + Returns: + Text with numbering prefix removed + """ + match = _RE_NUMBERING_PREFIX.match(text) + if match: + return match.group(2).strip() + return text + + +def calculate_word_match_score(link_words: List[str], heading_words: List[str]) -> float: + """ + Calculate word matching score between link text and heading text. + + Args: + link_words: List of words from link text + heading_words: List of words from heading text + + Returns: + Score from 0-100 based on word matching + """ + if not link_words or not heading_words: + return 0.0 + + link_set = set(link_words) + heading_set = set(heading_words) + + if link_set == heading_set: + return 100.0 + + if link_set.issubset(heading_set): + return 90.0 + + matching_words = link_set.intersection(heading_set) + if matching_words: + match_ratio = len(matching_words) / len(link_set) + return 60.0 + (match_ratio * 30.0) + + partial_matches = 0 + for link_word in link_words: + for heading_word in heading_words: + if link_word in heading_word or heading_word in link_word: + partial_matches += 1 + break + + if partial_matches > 0: + partial_ratio = partial_matches / len(link_words) + return 20.0 + (partial_ratio * 40.0) + + return 0.0 + + +def suggest_anchor(link_text, broken_anchor, target_file, heading_cache, verbose=False): + """ + Suggest correct anchor based on weighted heuristics. + + Args: + link_text: Text from the markdown link + broken_anchor: The broken anchor that was not found + target_file: Path to the target file + heading_cache: Dictionary mapping file paths to heading dictionaries + (anchor -> (heading_text, heading_level, line_number)) + verbose: If True, return detailed scoring information + + Returns: + Tuple of (suggested_anchor, confidence_score) or None if no good match found. + If verbose=True, returns (suggested_anchor, confidence_score, score_details) + """ + headings_dict = heading_cache.get(str(target_file), {}) + if not headings_dict: + return None + + normalized_link_text = normalize_text_for_matching(link_text) + link_words = extract_words(link_text) + broken_anchor_words = extract_words(broken_anchor.replace('-', ' ')) + + best_match = None + best_score = 0.0 + best_details = {} + + for anchor, (heading_text, heading_level, line_num) in headings_dict.items(): + if not validate_anchor(anchor): + continue + + heading_text_no_numbering = strip_numbering_prefix(heading_text) + normalized_heading = normalize_text_for_matching(heading_text_no_numbering) + heading_words = extract_words(heading_text_no_numbering) + + scores = {} + + word_score = calculate_word_match_score(link_words, heading_words) + scores['word_match'] = word_score + weighted_word = word_score * 0.4 + + anchor_score = 0.0 + if anchor == broken_anchor: + anchor_score = 100.0 + elif broken_anchor in anchor: + anchor_score = 70.0 + elif anchor in broken_anchor: + anchor_score = 50.0 + else: + anchor_words = extract_words(anchor.replace('-', ' ')) + anchor_word_score = calculate_word_match_score(broken_anchor_words, anchor_words) + anchor_score = anchor_word_score * 0.5 + scores['anchor_similarity'] = anchor_score + weighted_anchor = anchor_score * 0.3 + + context_score = 0.0 + if heading_level == 2: + context_score += 30.0 + elif heading_level == 3: + context_score += 20.0 + elif heading_level == 4: + context_score += 10.0 + if line_num < 100: + context_score += 5.0 + scores['context'] = context_score + weighted_context = context_score * 0.2 + + norm_score = 0.0 + if normalized_link_text == normalized_heading: + norm_score = 100.0 + elif (normalized_link_text in normalized_heading or + normalized_heading in normalized_link_text): + norm_score = 60.0 + scores['normalization'] = norm_score + weighted_norm = norm_score * 0.1 + + total_score = weighted_word + weighted_anchor + weighted_context + weighted_norm + + if total_score > best_score: + best_score = total_score + best_match = anchor + best_details = { + 'heading_text': heading_text, + 'heading_level': heading_level, + 'line_num': line_num, + 'scores': scores, + 'total_score': total_score + } + + if best_match and best_score >= 70.0: + if verbose: + return (best_match, best_score, best_details) + return (best_match, best_score) + + return None diff --git a/scripts/lib/_validation_utils.py b/scripts/lib/_validation_utils.py index acfceeeb..fc6dac7e 100644 --- a/scripts/lib/_validation_utils.py +++ b/scripts/lib/_validation_utils.py @@ -4,2183 +4,111 @@ This module provides common functionality for all validation scripts, including color support, standardized output formatting, and helper functions. -""" - -import os -import sys -import re -import types -import importlib.util -from pathlib import Path -from typing import Optional, List, Set, Tuple, Dict -from dataclasses import dataclass - -# Standard directory names used across validation scripts -DOCS_DIR = 'docs' -TECH_SPECS_DIR = 'tech_specs' -REQUIREMENTS_DIR = 'requirements' -FEATURES_DIR = 'features' - -# Color codes (ANSI escape sequences) -COLOR_GREEN = "32" -COLOR_RED = "31" -COLOR_YELLOW = "33" -COLOR_RESET = "0" - -# Standard separator width -SEPARATOR_WIDTH = 80 - -# Compiled regex patterns for performance (module level) -_RE_HEADING_PATTERN = re.compile(r'^(#{1,6})\s+(.+)$') -_RE_DECIMAL_PATTERN = re.compile(r'\d+\.\d+') -_RE_SENTENCE_END_PATTERN = re.compile(r'[.!?]+(?=\s+|$)') -_RE_HEADING_NUM_PATTERN = re.compile(r'^\d+(?:\.\d+)*$') - - -def supports_color(no_color_flag=False): - """ - Check if colors should be used. - - Args: - no_color_flag: If True, disable colors regardless of other conditions - - Returns: - True if colors should be used, False otherwise - """ - if no_color_flag or 'NO_COLOR' in os.environ: - return False - return sys.stdout.isatty() - - -def colorize(text, color_code, no_color_flag=False): - """ - Apply color to text if colors are supported. - - Args: - text: Text to colorize - color_code: ANSI color code (e.g., "32" for green) - no_color_flag: If True, disable colors - - Returns: - Colorized text if colors are supported, otherwise original text - """ - if supports_color(no_color_flag): - return f"\033[{color_code}m{text}\033[0m" - return text - - -def format_summary_line(label, value, label_width=25, value_width=6): - """ - Format a summary line with aligned columns. - - Args: - label: Label text (left-aligned) - value: Value to display (right-aligned) - label_width: Width for label column (default: 25) - value_width: Width for value column (default: 6) - - Returns: - Formatted string with aligned columns - """ - return f"{label:<{label_width}} {value:>{value_width}}" - - -def calculate_label_width(labels, min_width=25, max_width=50): - """ - Calculate optimal label width for a set of summary labels. - - Args: - labels: List of label strings - min_width: Minimum label width (default: 25) - max_width: Maximum label width (default: 50) - - Returns: - Optimal label width for formatting - """ - if not labels: - return min_width - max_label_len = max(len(label) for label in labels) - return min(max(max_label_len + 1, min_width), max_width) - - -def parse_no_color_flag(args): - """ - Parse --nocolor or --no-color flag from command line arguments. - - Args: - args: List of command line arguments (typically sys.argv) - - Returns: - True if --nocolor or --no-color flag is present, False otherwise - """ - return '--nocolor' in args or '--no-color' in args - - -def format_issue_message( - severity, issue_type, file_path, line_num=None, message=None, suggestion=None, no_color=False -): - """ - Format an error or warning message with consistent structure. - - Args: - severity: Either "error" or "warning" (case-insensitive) - issue_type: Short description of the issue type (e.g., "Heading without Req") - file_path: Path to the file (will be converted to string) - line_num: Optional line number - message: Optional additional message/details - suggestion: Optional suggestion for fixing the issue (formatted as " -> {suggestion}") - no_color: If True, disable colors - - Returns: - Formatted error or warning message string with color applied - """ - severity_lower = severity.lower() - if severity_lower not in ('error', 'warning'): - raise ValueError(f"severity must be 'error' or 'warning', got '{severity}'") - - is_error = severity_lower == 'error' - prefix = "ERROR" if is_error else "WARNING" - color_code = COLOR_RED if is_error else COLOR_YELLOW - - file_str = str(file_path) - if line_num is not None: - location = f"{file_str}:{line_num}" - else: - location = file_str - - # Build the message parts - parts = [f"{prefix}: {issue_type}: {location}"] - - if message: - parts.append(message) - - # Build the full message - if len(parts) > 1: - issue_msg = ": ".join(parts) - else: - issue_msg = parts[0] - - # Add suggestion if present (without extra colon since it already has " -> ") - if suggestion: - issue_msg = f"{issue_msg} -> {suggestion}" - - return colorize(issue_msg, color_code, no_color) - - -class OutputBuilder: - """ - Builder for consistent script output formatting. - - Handles headers, summaries, success messages, and spacing automatically. - Tracks line types (error, warning, info) and supports verbose mode filtering. - Automatically orders output sections in the correct sequence. - """ - - # Line type constants - LINE_INFO = 'info' - LINE_ERROR = 'error' - LINE_WARNING = 'warning' - LINE_VERBOSE = 'verbose' - - def __init__(self, title, description, no_color=False, output_file=None, verbose=False): - """ - Initialize the output builder. - - Args: - title: Script title for header - description: Brief description for header - no_color: If True, disable colors - output_file: Optional file path to write output to - verbose: If True, include verbose-only lines in output - """ - # Separate sections for automatic ordering - self.header_lines = [] - self.working_verbose_lines = [] # Working/progress verbose output - self.summary_lines = [] - self.warning_lines = [] - self.error_lines = [] - self.final_message_lines = [] # Success messages, etc. - - # Metadata for each section - self.header_metadata = [] - self.working_verbose_metadata = [] - self.summary_metadata = [] - self.warning_metadata = [] - self.error_metadata = [] - self.final_message_metadata = [] - - self.no_color = no_color - self.output_file = output_file - self.verbose = verbose - self._last_was_blank = {} # Track blank lines per section - self._has_warnings = False - self._has_errors = False - self._header_printed = False # Track if header has been streamed - self._streamed_lines = [] # Track streamed lines for file output - self._streamed_header_count = 0 # Number of header lines streamed - self._streamed_verbose_count = 0 # Number of working_verbose lines streamed - self._summary_header_added = False # Track if summary header has been added - self._has_success_message = False # Track if success message has been added - self._has_failure_message = False # Track if failure message has been added - self._errors_header_added = False # Track if errors header has been added - self._warnings_header_added = False # Track if warnings header has been added - - # Add header immediately (will stream if verbose) - self.add_header(title, description) - - def _add_to_section(self, section, line, line_type=LINE_INFO, verbose_only=False): - """ - Internal method to add a line to a specific section. - - Strips whitespace-only lines (they should be added via add_blank_line instead). - """ - # Strip whitespace-only lines - they should be added via add_blank_line - # Only process non-empty lines (empty lines should use add_blank_line) - if line and line.strip(): - section_lines = getattr(self, f"{section}_lines") - section_metadata = getattr(self, f"{section}_metadata") - section_lines.append(line) - section_metadata.append((line_type, verbose_only)) - self._last_was_blank[section] = False - - def _add_blank_to_section(self, section): - """Internal method to add a blank line to a specific section.""" - if not self._last_was_blank.get(section, False): - section_lines = getattr(self, f"{section}_lines") - section_metadata = getattr(self, f"{section}_metadata") - section_lines.append("") - section_metadata.append((self.LINE_INFO, False)) - self._last_was_blank[section] = True - - def add_header(self, title, description): - """ - Add script header with separators. - - If verbose=True, prints header immediately. Otherwise buffers it. - - Args: - title: Script title - description: Brief description - """ - separator = "=" * SEPARATOR_WIDTH - header_text = f"{title} - {description}" - header_lines = [separator, header_text, separator] - - # Store in header section for final output - for line in header_lines: - self._add_to_section("header", line) - - # If verbose, print header immediately - if self.verbose and not self._header_printed: - for line in header_lines: - print(line) - if self.output_file: - self._streamed_lines.append(line) - self._header_printed = True - # Count header lines that will be in final output (after filtering) - filtered_header = self._filter_section(self.header_lines, self.header_metadata) - self._streamed_header_count = len(filtered_header) - - def add_summary_header(self): - """Add summary section header.""" - if self._summary_header_added: - return # Already added, avoid duplicates - separator = "=" * SEPARATOR_WIDTH - self._add_to_section("summary", separator) - self._add_to_section("summary", "Summary") - self._add_to_section("summary", separator) - self._summary_header_added = True - - def add_summary_section(self, items, label_width=None, value_width=6): - """ - Add summary items with consistent formatting. - - Automatically adds summary header if: - - There are summary items - - AND (verbose is True OR there are warnings OR there are errors) - - AND summary header hasn't been added yet - - Args: - items: List of (label, value) tuples - label_width: Optional label width (auto-calculated if None) - value_width: Value column width (default: 6) - """ - if not items: - return - - # Automatically add summary header if conditions are met - should_show_summary = self.verbose or self._has_warnings or self._has_errors - if should_show_summary and not self._summary_header_added: - self.add_summary_header() - - if label_width is None: - labels = [item[0] for item in items] - label_width = calculate_label_width(labels) - - for label, value in items: - line = format_summary_line(label, value, label_width, value_width) - self._add_to_section("summary", line) - - def add_success_message(self, message): - """ - Add success message with proper spacing. - - Adds: 1 blank line before, message with ✅ prefix, 1 blank line after. - - Args: - message: Success message text (✅ will be automatically prepended) - """ - # Clear failure message if present (mutually exclusive) - if self._has_failure_message: - self._clear_final_messages() - self._has_success_message = True - self._has_failure_message = False - - self._add_blank_to_section("final_message") - # Automatically prepend ✅ if not already present - if not message.startswith("✅ "): - message = f"✅ {message}" - colored_msg = colorize(message, COLOR_GREEN, self.no_color) - self._add_to_section("final_message", colored_msg) - self._add_blank_to_section("final_message") - - def add_failure_message(self, message): - """ - Add failure message with proper spacing. - - Adds: 1 blank line before, message with ❌ prefix, 1 blank line after. - - Args: - message: Failure message text (❌ will be automatically prepended) - """ - # Clear success message if present (mutually exclusive) - if self._has_success_message: - self._clear_final_messages() - self._has_success_message = False - self._has_failure_message = True - - self._add_blank_to_section("final_message") - # Automatically prepend ❌ if not already present - if not message.startswith("❌ "): - message = f"❌ {message}" - colored_msg = colorize(message, COLOR_RED, self.no_color) - self._add_to_section("final_message", colored_msg) - self._add_blank_to_section("final_message") - - def add_errors_header(self): - """ - Add errors section header (standardized, like Summary). - - Note: This only adds the header. The header will only be displayed - if there are actual error lines (not just the header itself). - """ - if self._errors_header_added: - return # Already added, avoid duplicates - self._has_errors = True - separator = "=" * SEPARATOR_WIDTH - self._add_to_section("error", separator, line_type=self.LINE_ERROR) - self._add_to_section("error", "Errors", line_type=self.LINE_ERROR) - self._add_to_section("error", separator, line_type=self.LINE_ERROR) - self._errors_header_added = True - - def add_warnings_header(self): - """Add warnings section header (standardized, like Summary).""" - if self._warnings_header_added: - return # Already added, avoid duplicates - self._has_warnings = True - separator = "=" * SEPARATOR_WIDTH - self._add_to_section("warning", separator, line_type=self.LINE_WARNING) - self._add_to_section("warning", "Warnings", line_type=self.LINE_WARNING) - self._add_to_section("warning", separator, line_type=self.LINE_WARNING) - self._warnings_header_added = True - - def add_separator(self, section="summary"): - """ - Add a separator line to the specified section. - - Args: - section: Section to add separator to (default: "summary") - """ - separator = "=" * SEPARATOR_WIDTH - if section == "error": - line_type = self.LINE_ERROR - elif section == "warning": - line_type = self.LINE_WARNING - else: - line_type = self.LINE_INFO - self._add_to_section(section, separator, line_type=line_type) - - def add_line(self, line, line_type=LINE_INFO, verbose_only=False, section="summary"): - """ - Add a raw line to output. - - Args: - line: Line text to add - line_type: Type of line ('info', 'error', 'warning', 'verbose') - verbose_only: If True, only include this line when verbose=True - section: Section to add to ('header', 'working_verbose', 'summary', - 'warning', 'error', 'final_message') - """ - self._add_to_section(section, line, line_type=line_type, verbose_only=verbose_only) - - def add_error_line(self, line, verbose_only=False): - """ - Add an error line to output. - - Automatically adds errors header if not already added. - - Args: - line: Line text to add - verbose_only: If True, only include this line when verbose=True - """ - self._has_errors = True - # Automatically add errors header if not already added - if not self._errors_header_added: - self.add_errors_header() - self._add_to_section("error", line, line_type=self.LINE_ERROR, verbose_only=verbose_only) - - def add_warning_line(self, line, verbose_only=False): - """ - Add a warning line to output. - - Args: - line: Line text to add - verbose_only: If True, only include this line when verbose=True - """ - self._has_warnings = True - # Automatically add warnings header if not already added - if not self._warnings_header_added: - self.add_warnings_header() - self._add_to_section( - "warning", line, line_type=self.LINE_WARNING, verbose_only=verbose_only - ) - - def add_verbose_line(self, line, line_type=LINE_INFO): - """ - Add a verbose-only line to working verbose output section. - - If verbose=True, prints line immediately (after ensuring header is printed). - Otherwise buffers it. - - Args: - line: Line text to add - line_type: Type of line ('info', 'error', 'warning') - """ - # Store in working_verbose section for final output - self._add_to_section("working_verbose", line, line_type=line_type, verbose_only=True) - - # If verbose, print immediately (after header if needed) - if self.verbose: - if not self._header_printed: - # Header hasn't been printed yet, but we're trying to stream - # This shouldn't happen if scripts call add_header first, but handle gracefully - pass - print(line) - if self.output_file: - self._streamed_lines.append(line) - - def add_blank_line(self, section="summary"): - """ - Add a blank line to a specific section. - - If verbose=True and section is "working_verbose", prints immediately. - Otherwise buffers it. - - Args: - section: Section to add blank line to - """ - self._add_blank_to_section(section) - - # If verbose and this is working_verbose, print immediately - if self.verbose and section == "working_verbose" and self._header_printed: - print("") - if self.output_file: - self._streamed_lines.append("") - # Track that we've streamed this blank verbose line - self._streamed_verbose_count += 1 - - def _filter_section(self, section_lines, section_metadata): - """ - Filter a section's lines based on verbose mode. - - Args: - section_lines: List of lines in the section - section_metadata: List of (line_type, verbose_only) tuples - - Returns: - Filtered list of lines - """ - filtered = [] - for line, (line_type, verbose_only) in zip(section_lines, section_metadata): - if not verbose_only or self.verbose: - filtered.append(line) - return filtered - - def _get_ordered_sections(self): - """ - Get all sections in the correct order with filtering applied. - - Header and summary are only included if verbose=True OR if there are warnings/errors. - - Returns: - List of lines in correct order: header, working_verbose, summary, - warning, error, final_message - """ - all_lines = [] - - # Check if we should show header and summary - show_header_summary = self.verbose or self._has_warnings or self._has_errors - - # 1. Header (only if verbose or has warnings/errors) - if show_header_summary: - all_lines.extend(self._filter_section(self.header_lines, self.header_metadata)) - - # 2. Working verbose output - working_verbose = self._filter_section( - self.working_verbose_lines, self.working_verbose_metadata - ) - if working_verbose: - all_lines.extend(working_verbose) - - # 3. Summary (only if verbose or has warnings/errors) - if show_header_summary: - all_lines.extend(self._filter_section(self.summary_lines, self.summary_metadata)) - - # 4. Warnings (with header if any warnings exist) - warnings = self._filter_section(self.warning_lines, self.warning_metadata) - if warnings: - # Check if warnings header already exists (from add_warnings_header) - # Look for separator line followed by "Warnings" - has_warnings_header = False - separator = "=" * SEPARATOR_WIDTH - for i in range(min(3, len(warnings))): - if warnings[i] == separator: - if i + 1 < len(warnings) and warnings[i + 1].strip() == "Warnings": - has_warnings_header = True - break - - if not has_warnings_header: - # Add standard warnings header - all_lines.append(separator) - all_lines.append("Warnings") - all_lines.append(separator) - all_lines.extend(warnings) - - # 5. Errors (with header if any errors exist, but skip if only headers/separators) - errors = self._filter_section(self.error_lines, self.error_metadata) - # Filter out empty sections (only headers/separators, no actual error content) - non_header_errors = [ - line for line in errors - if line.strip() and line.strip() != "Errors" and - not (line == "=" * SEPARATOR_WIDTH) - ] - if non_header_errors: - # Check if errors header already exists (from add_errors_header) - # Look for separator line followed by "Errors" - has_errors_header = False - separator = "=" * SEPARATOR_WIDTH - for i in range(min(3, len(errors))): - if errors[i] == separator: - if i + 1 < len(errors) and errors[i + 1].strip() == "Errors": - has_errors_header = True - break - - if not has_errors_header: - # Add standard errors header - all_lines.append(separator) - all_lines.append("Errors") - all_lines.append(separator) - all_lines.extend(errors) - - # 6. Final messages - all_lines.extend( - self._filter_section(self.final_message_lines, self.final_message_metadata) - ) - - return all_lines - - def print(self): - """ - Print all lines to stdout and optionally to file. - - Outputs sections in correct order: header, working_verbose, summary, - warning, error, final_message. - Filters lines based on verbose mode before printing. - - If verbose=True, header and working_verbose have already been streamed, - so only prints summary, warnings, errors, and final messages. - - After printing, clears all sections. - """ - all_lines = self._get_ordered_sections() - if not all_lines: - return - - # If verbose, we've already streamed header and working_verbose - # Only print the remaining sections (summary, warnings, errors, final_message) - if self.verbose and self._header_printed: - # Skip header and working_verbose (already streamed) - # Count how many lines to skip: header + working_verbose (after filtering) - filtered_header = self._filter_section(self.header_lines, self.header_metadata) - filtered_verbose = self._filter_section( - self.working_verbose_lines, self.working_verbose_metadata - ) - skip_count = len(filtered_header) + len(filtered_verbose) - - # Find where summary starts (look for separator or summary items) - separator = "=" * SEPARATOR_WIDTH - summary_start = None - - # First try to find "Summary" header - for i, line in enumerate(all_lines): - if (line == separator and i + 1 < len(all_lines) and - all_lines[i + 1].strip() == "Summary"): - summary_start = i - break - - # If no summary header found, use skip_count to skip header + verbose - if summary_start is not None: - remaining_lines = all_lines[summary_start:] - elif skip_count > 0: - # Skip the header and working_verbose lines that were already streamed - # Ensure we don't skip more than available lines - if skip_count < len(all_lines): - remaining_lines = all_lines[skip_count:] - else: - # If skip_count is >= len(all_lines), everything was already streamed - remaining_lines = [] - else: - # Fallback: try to find first summary-like line (label: value pattern) - # or first non-header/verbose line - for i, line in enumerate(all_lines): - # Skip separator lines that are part of header (first 3 lines) - if line == separator and i < 3: - continue - # Look for summary items (label: value) or section headers - has_colon_with_digit = ( - ':' in line and - any(c.isdigit() for c in line.split(':', 1)[-1].strip()) - ) - if has_colon_with_digit or (line == separator and i > 2): - summary_start = i - break - if summary_start is not None: - remaining_lines = all_lines[summary_start:] - else: - # Last resort: everything was already streamed - remaining_lines = [] - else: - # Not verbose or header not printed yet - print everything - # But ensure header is only included once - remaining_lines = all_lines - - if remaining_lines: - # Collapse consecutive blank lines (max 2 consecutive) - collapsed_lines = [] - prev_was_blank = False - for line in remaining_lines: - is_blank = (line == "" or line.strip() == "") - if is_blank: - # Only add blank line if previous line wasn't blank - if not prev_was_blank: - collapsed_lines.append("") - prev_was_blank = True - else: - collapsed_lines.append(line) - prev_was_blank = False - - output_text = "\n".join(collapsed_lines) - output_text += "\n" # Final newline - print(output_text, end="") - - if self.output_file: - # Append collapsed lines to streamed lines for file output - self._streamed_lines.extend(collapsed_lines) - - # Write to file if specified - if self.output_file: - try: - with open(self.output_file, 'w', encoding='utf-8') as f: - # Combine streamed lines and remaining lines, remove color codes - import re - all_file_lines = self._streamed_lines + remaining_lines - if all_file_lines: - file_text = "\n".join(all_file_lines) - file_text += "\n" # Final newline - file_text = re.sub(r'\033\[[0-9;]*m', '', file_text) - f.write(file_text) - except IOError as e: - print( - f"Error: Cannot write to output file {self.output_file}: {e}", - file=sys.stderr - ) - - # Clear all sections - self.header_lines = [] - self.header_metadata = [] - self.working_verbose_lines = [] - self.working_verbose_metadata = [] - self.summary_lines = [] - self.summary_metadata = [] - self.warning_lines = [] - self.warning_metadata = [] - self.error_lines = [] - self.error_metadata = [] - self.final_message_lines = [] - self.final_message_metadata = [] - self._last_was_blank = {} - self._header_printed = False - self._streamed_lines = [] - - def print_preview(self): - """ - Print all current lines to stdout without clearing. - - Intended for showing output before interactive prompts. - """ - all_lines = self._get_ordered_sections() - if not all_lines: - return - output_text = "\n".join(all_lines) - output_text += "\n" - print(output_text, end="") - - def get_lines(self, filter_verbose=True): - """ - Get all lines as a list in correct order (for custom processing). - - Args: - filter_verbose: If True, filter based on verbose mode - - Returns: - List of output lines in correct order - """ - if filter_verbose: - return self._get_ordered_sections() - # If not filtering, combine all sections in order - all_lines = [] - all_lines.extend(self.header_lines) - all_lines.extend(self.working_verbose_lines) - all_lines.extend(self.summary_lines) - all_lines.extend(self.warning_lines) - all_lines.extend(self.error_lines) - all_lines.extend(self.final_message_lines) - return all_lines - - def get_exit_code(self, no_fail=False): - """ - Get the appropriate exit code based on errors found. - - Args: - no_fail: If True, always return 0 (even if errors were found) - - Returns: - 0 if no errors found or no_fail is True, 1 if errors were found - """ - if no_fail: - return 0 - return 0 if not self._has_errors else 1 - - def _clear_final_messages(self): - """Clear final message section (used when switching between success/failure).""" - self.final_message_lines = [] - self.final_message_metadata = [] - - def clear(self): - """Clear all accumulated lines from all sections.""" - self.header_lines = [] - self.header_metadata = [] - self.working_verbose_lines = [] - self.working_verbose_metadata = [] - self.summary_lines = [] - self.summary_metadata = [] - self.warning_lines = [] - self.warning_metadata = [] - self.error_lines = [] - self.error_metadata = [] - self.final_message_lines = [] - self.final_message_metadata = [] - self._last_was_blank = {} - self._has_success_message = False - self._has_failure_message = False - self._errors_header_added = False - self._warnings_header_added = False - - -def is_in_dot_directory(path: Path) -> bool: - """ - Check if a path contains any directory starting with '.'. - - Args: - path: Path object to check - - Returns: - True if path contains any directory starting with '.' (except '.' itself), False otherwise - """ - for part in path.parts: - if part.startswith('.') and part != '.': - return True - return False - - -def find_markdown_files( - target_paths: Optional[List[str]] = None, - root_dir: Optional[Path] = None, - default_dir: Optional[Path] = None, - exclude_dirs: Optional[Set[str]] = None, - verbose: bool = False, - return_strings: bool = False -) -> List[Path]: - """ - Find markdown files in the repository or target paths. - - Args: - target_paths: Optional list of specific files or directories to check - root_dir: Root directory to search from (when target_paths is None) - default_dir: Default directory to search if target_paths is None and root_dir is None - exclude_dirs: Set of directory names to exclude when scanning root_dir - verbose: Whether to show detailed progress - return_strings: If True, return list of strings instead of Path objects - - Returns: - List of Path objects (or strings if return_strings=True) for markdown files found - """ - md_files = [] - default_exclude_dirs = { - 'node_modules', 'vendor', 'tmp', '.git', '.venv', 'venv', - '__pycache__', '.pytest_cache', 'dist', 'build', - '.idea', '.vscode', '.cache' - } - if exclude_dirs is None: - exclude_dirs = default_exclude_dirs - - if target_paths: - for target_path in target_paths: - target = Path(target_path) - if not target.exists(): - if verbose: - print( - f"Warning: Target path does not exist: {target_path}", - file=sys.stderr - ) - continue - - if target.is_file(): - if target.suffix == '.md' and not is_in_dot_directory(target): - md_files.append(target) - else: - if verbose: - print( - f"Warning: Target file is not a markdown file: {target_path}", - file=sys.stderr - ) - else: - # Recursively find markdown files in target directory - for md_file in target.rglob('*.md'): - if not is_in_dot_directory(md_file): - md_files.append(md_file) - else: - # Determine which directory to search - search_dir = root_dir - if search_dir is None: - if default_dir is not None: - search_dir = default_dir - else: - search_dir = Path('.') - - if not search_dir.exists(): - if verbose: - print(f"Error: Search directory does not exist: {search_dir}", file=sys.stderr) - return [] - - # If default_dir is specified, only search that directory (non-recursive for glob) - if default_dir is not None and root_dir is None: - md_files = [ - f for f in sorted(search_dir.glob('*.md')) - if not is_in_dot_directory(f) - ] - else: - # Recursive search with exclusions - for md_file in search_dir.rglob('*.md'): - # Check if any excluded directory is in the path - if any(excluded in md_file.parts for excluded in exclude_dirs): - continue - # Also exclude dot directories - if is_in_dot_directory(md_file): - continue - md_files.append(md_file) - - if return_strings: - return sorted([str(f) for f in md_files]) - return sorted(md_files) - - -def find_feature_files( - target_paths: Optional[List[str]] = None, - root_dir: Optional[Path] = None, - default_dir: Optional[Path] = None, - exclude_dirs: Optional[Set[str]] = None, - verbose: bool = False, - return_strings: bool = False -) -> List[Path]: - """ - Find feature files (.feature) in the repository or target paths. - - Args: - target_paths: Optional list of specific files or directories to check - root_dir: Root directory to search from (when target_paths is None) - default_dir: Default directory to search if target_paths is None and root_dir is None - exclude_dirs: Set of directory names to exclude when scanning root_dir - verbose: Whether to show detailed progress - return_strings: If True, return list of strings instead of Path objects - - Returns: - List of Path objects (or strings if return_strings=True) for feature files found - """ - feature_files = [] - default_exclude_dirs = { - 'node_modules', 'vendor', 'tmp', '.git', '.venv', 'venv', - '__pycache__', '.pytest_cache', 'dist', 'build', - '.idea', '.vscode', '.cache' - } - if exclude_dirs is None: - exclude_dirs = default_exclude_dirs - - if target_paths: - for target_path in target_paths: - target = Path(target_path) - if not target.exists(): - if verbose: - print( - f"Warning: Target path does not exist: {target_path}", - file=sys.stderr - ) - continue - - if target.is_file(): - if target.suffix == '.feature' and not is_in_dot_directory(target): - feature_files.append(target) - else: - if verbose: - print( - f"Warning: Target file is not a .feature file: {target_path}", - file=sys.stderr - ) - else: - # Recursively find feature files in target directory - for feature_file in target.rglob('*.feature'): - if not is_in_dot_directory(feature_file): - feature_files.append(feature_file) - else: - # Determine which directory to search - search_dir = root_dir - if search_dir is None: - if default_dir is not None: - search_dir = default_dir - else: - search_dir = Path('.') - - if not search_dir.exists(): - if verbose: - print(f"Error: Search directory does not exist: {search_dir}", file=sys.stderr) - return [] - - # If default_dir is specified, search that directory recursively - if default_dir is not None and root_dir is None: - feature_files = [ - f for f in sorted(search_dir.rglob('*.feature')) - if not is_in_dot_directory(f) - and not any(excluded in f.parts for excluded in exclude_dirs) - ] - else: - # Recursive search with exclusions - for feature_file in search_dir.rglob('*.feature'): - # Check if any excluded directory is in the path - if any(excluded in feature_file.parts for excluded in exclude_dirs): - continue - # Also exclude dot directories - if is_in_dot_directory(feature_file): - continue - feature_files.append(feature_file) - - if return_strings: - return sorted([str(f) for f in feature_files]) - return sorted(feature_files) - - -def get_validation_exit_code(has_errors, no_fail=False): - """ - Get the appropriate exit code for validation scripts. - - Args: - has_errors: True if validation errors were found, False otherwise - no_fail: If True, always return 0 (even if errors were found) - - Returns: - 0 if no errors found or no_fail is True, 1 if errors were found - - Note: - This function is for scripts that track errors separately from OutputBuilder. - Scripts using OutputBuilder should use output.get_exit_code(no_fail) instead. - """ - if no_fail: - return 0 - return 0 if not has_errors else 1 - - -def get_workspace_root() -> Path: - """ - Get the workspace root directory (parent of scripts directory). - - Returns: - Path to workspace root - """ - script_dir = Path(__file__).parent - return script_dir.parent.parent - - -def import_module_with_fallback(module_name: str, script_dir: Path) -> types.ModuleType: - """ - Import a module by name. - - Args: - module_name: Name of module to import (e.g., '_validation_utils') - script_dir: Directory containing the module file (unused) - - Returns: - Imported module - """ - return importlib.import_module(module_name) - - -def parse_paths(path_str: Optional[str]) -> Optional[List[str]]: - """ - Parse comma-separated path string into list of paths. - - Args: - path_str: Comma-separated string of paths, or None - - Returns: - List of trimmed path strings, or None if path_str is None/empty - """ - if not path_str: - return None - return [p.strip() for p in path_str.split(',') if p.strip()] - - -@dataclass(frozen=True) -class HeadingContext: - """ - Context information about a markdown heading. - - Used to track heading information for code blocks and signatures. - """ - heading_text: str # The heading text (without # markers) - heading_level: int # Heading depth (1-6, where 1 is most general) - heading_line: int # Line number of the heading (1-indexed) - file_path: Optional[str] = None # Optional file path for context - - -@dataclass -class ProseSection: - """ - Represents a prose-only section in a markdown document. - - This supports Overview blocks and prose subsections in index-style documents. - """ - - heading_str: str - heading_num: Optional[str] - heading_level: int - heading_line: Optional[int] - content: str - parent_section: Optional["ProseSection"] = None - child_sections: List["ProseSection"] = None - has_code: bool = False - code_blocks: List[Tuple[int, int, str]] = None - file_path: Optional[str] = None - lines: Optional[Tuple[int, int]] = None - - def __post_init__(self) -> None: - if self.child_sections is None: - self.child_sections = [] - if self.code_blocks is None: - self.code_blocks = [] - if self.heading_num is not None: - if not isinstance(self.heading_num, str): - raise ValueError("heading_num must be a string or None") - if not _RE_HEADING_NUM_PATTERN.match(self.heading_num): - raise ValueError( - "heading_num must be a dotted number like '1', '2.4', or '3.5.6', got: %r" - % (self.heading_num,) - ) - - def path_label(self) -> str: - parts: List[str] = [] - cur: Optional["ProseSection"] = self - while cur is not None: - parts.append(cur.heading_str) - cur = cur.parent_section - parts.reverse() - return " > ".join(parts) - - -def extract_headings(content: str, skip_code_blocks: bool = True) -> List[Tuple[str, int, int]]: - """ - Extract all headings from markdown content. - - Args: - content: Markdown content as string - skip_code_blocks: If True, skip headings inside code blocks - - Returns: - List of tuples: (heading_text, heading_level, line_number) - Lines are 1-indexed. - """ - headings: List[Tuple[str, int, int]] = [] - lines = content.split('\n') - in_code_block = False - - for i, line in enumerate(lines, 1): - stripped_line = line.strip() - - if skip_code_blocks: - # Check for code block boundaries - if stripped_line.startswith('```'): - in_code_block = not in_code_block - continue - - # Skip lines inside code blocks - if in_code_block: - continue - - # Match markdown headings (# through ######) - match = _RE_HEADING_PATTERN.match(stripped_line) - if match: - heading_level = len(match.group(1)) - heading_text = match.group(2).strip() - headings.append((heading_text, heading_level, i)) - return headings - - -def extract_headings_from_file( - file_path: Path, skip_code_blocks: bool = True, file_cache: Optional['FileContentCache'] = None -) -> List[Tuple[str, int, int]]: - """ - Extract all headings from a markdown file. - - Args: - file_path: Path to the markdown file - skip_code_blocks: If True, skip headings inside code blocks - file_cache: Optional FileContentCache instance to use for reading files - - Returns: - List of tuples: (heading_text, heading_level, line_number) - Lines are 1-indexed. - """ - try: - if file_cache: - content = file_cache.get_content(file_path) - else: - with open(file_path, 'r', encoding='utf-8') as f: - content = f.read() - return extract_headings(content, skip_code_blocks=skip_code_blocks) - except Exception as e: - print(f"Error reading {file_path}: {e}", file=sys.stderr) - return [] - - -def extract_headings_with_anchors( - file_path: Path, min_level: int = 1, max_level: int = 6, - skip_code_blocks: bool = True, file_cache: Optional['FileContentCache'] = None -) -> Dict[str, Tuple[str, int, int]]: - """ - Extract all headings from a markdown file and generate anchors. - - Args: - file_path: Path to the markdown file - min_level: Minimum heading level to include (1-6, default: 1) - max_level: Maximum heading level to include (1-6, default: 6) - skip_code_blocks: If True, skip headings inside code blocks - file_cache: Optional FileContentCache instance to use for reading files - - Returns: - Dictionary mapping anchor -> (heading_text, heading_level, line_number) - """ - headings_dict = {} - headings = extract_headings_from_file( - file_path, skip_code_blocks=skip_code_blocks, file_cache=file_cache - ) - for heading_text, heading_level, line_num in headings: - if min_level <= heading_level <= max_level: - anchor = generate_anchor_from_heading(heading_text, include_hash=False) - headings_dict[anchor] = (heading_text, heading_level, line_num) - return headings_dict - - -def extract_h2_plus_headings_with_sections( - file_path: Path, skip_code_blocks: bool = True, - file_cache: Optional['FileContentCache'] = None -) -> List[Tuple[int, str, int, str, Optional[str]]]: - """ - Extract H2+ headings (## through ######) with anchors and section numbers. - - Args: - file_path: Path to the markdown file - skip_code_blocks: If True, skip headings inside code blocks - file_cache: Optional FileContentCache instance to use for reading files - - Returns: - List of tuples: (heading_level, heading_text, line_num, anchor, section_anchor) - where: - - heading_level is 2 for ##, 3 for ###, etc. - - anchor is the plain anchor from heading text - - section_anchor is the anchor with section number prefix (if section number exists) - """ - headings_list = [] - headings = extract_headings_from_file( - file_path, skip_code_blocks=skip_code_blocks, file_cache=file_cache - ) - for heading_text, heading_level, line_num in headings: - # Only include H2+ headings (level 2-6) - if heading_level < 2: - continue - - # Extract section number if present (e.g., "1.2.3 Heading" -> "1.2.3") - section_match = re.match(r'^(\d+(?:\.\d+)*)\s+(.+)$', heading_text) - section_anchor = None - - if section_match: - # Heading has section number: "1.2.3 Heading Text" - section_num = section_match.group(1) - section_num_no_dots = section_num.replace('.', '') - heading_text_without_section = section_match.group(2).strip() - # Generate anchor from heading text without section number - anchor = generate_anchor_from_heading(heading_text_without_section, include_hash=False) - # Section anchor: section_num-anchor (e.g., "123-heading-text") - section_anchor = f"{section_num_no_dots}-{anchor}" - else: - # Heading has no section number: just generate anchor from text - anchor = generate_anchor_from_heading(heading_text, include_hash=False) - - headings_list.append((heading_level, heading_text, line_num, anchor, section_anchor)) - return headings_list - - -def extract_headings_with_section_numbers( - file_path: Path, min_level: int = 2, max_level: int = 6, - skip_code_blocks: bool = True, file_cache: Optional['FileContentCache'] = None -) -> Tuple[Set[str], Dict[str, Tuple[str, str]]]: - """ - Parse markdown file to extract all heading anchors and section numbers. - - Args: - file_path: Path to the markdown file - min_level: Minimum heading level to include (1-6, default: 2 for H2+) - max_level: Maximum heading level to include (1-6, default: 6) - skip_code_blocks: If True, skip headings inside code blocks - file_cache: Optional FileContentCache instance to use for reading files - - Returns: - Tuple of (anchors set, sections dict where key is section_num and - value is (heading_text, anchor)) - """ - anchors = set() - sections = {} # section_num -> (heading_text, anchor) - - if not file_path.exists(): - return anchors, sections - - headings = extract_headings_from_file( - file_path, skip_code_blocks=skip_code_blocks, file_cache=file_cache - ) - for heading_text, heading_level, line_num in headings: - if min_level <= heading_level <= max_level: - # Generate anchor from heading text (without '#' prefix) - anchor = generate_anchor_from_heading(heading_text, include_hash=False) - anchors.add(anchor) - - # Extract section number if present (e.g., "2.1 AddFile Package Method" -> "2.1") - section_match = re.match(r'^(\d+(?:\.\d+)*)', heading_text) - if section_match: - section_num = section_match.group(1) - sections[section_num] = (heading_text, anchor) - - return anchors, sections - - -def find_heading_before_line( - content: str, line_num: int, prefer_deepest: bool = True -) -> Optional[HeadingContext]: - """ - Find the heading context for a given line number in markdown content. - - Args: - content: Markdown content as string - line_num: Target line number (1-indexed) - prefer_deepest: If True, return the most specific (deepest) heading. - If False, return the most recent heading. - - Returns: - HeadingContext if a heading is found before the line, None otherwise. - """ - lines = content.split('\n') - - if line_num < 1 or line_num > len(lines): - return None - - # Find the most recent heading before this line - - if prefer_deepest: - # Track heading stack to find the most specific heading - heading_stack = [] # List of (level, text, line_num) tuples - - for i, line in enumerate(lines[:line_num], 1): - match = _RE_HEADING_PATTERN.match(line.strip()) - if match: - level = len(match.group(1)) - text = match.group(2).strip() - # Remove headings at same or deeper level from stack - heading_stack = [h for h in heading_stack if h[0] < level] - # Add this heading - heading_stack.append((level, text, i)) - - # Get the most specific (deepest) heading - if heading_stack: - last_heading_level, last_heading, last_heading_line = heading_stack[-1] - return HeadingContext( - heading_text=last_heading, - heading_level=last_heading_level, - heading_line=last_heading_line - ) - else: - # Find the most recent heading (not necessarily deepest) - for i in range(line_num - 1, -1, -1): - if i < len(lines): - match = _RE_HEADING_PATTERN.match(lines[i].strip()) - if match: - level = len(match.group(1)) - text = match.group(2).strip() - return HeadingContext( - heading_text=text, - heading_level=level, - heading_line=i + 1 - ) - - return None - - -def find_heading_for_code_block( - content: str, code_block_start_line: int -) -> Optional[str]: - """ - Find the heading text that appears before a code block. - - This is a simpler version that just returns the heading text, - useful for cases where only the text is needed. - - Args: - content: Markdown content as string - code_block_start_line: Line number where the code block starts (1-indexed) - - Returns: - Heading text if found, None otherwise. - """ - ctx = find_heading_before_line(content, code_block_start_line, prefer_deepest=False) - return ctx.heading_text if ctx else None - - -def get_common_abbreviations() -> Set[str]: - """ - Get comprehensive list of common abbreviations (case-insensitive matching). - - Returns: - Set of abbreviations (all lowercase for case-insensitive matching) - """ - return { - # Titles - 'dr.', 'mr.', 'mrs.', 'ms.', 'prof.', - # Academic degrees - 'ph.d.', 'm.d.', 'b.a.', 'm.a.', 'b.s.', 'm.s.', - # Common abbreviations - 'etc.', 'i.e.', 'e.g.', 'vs.', 'a.m.', 'p.m.', - # Business/location - 'inc.', 'ltd.', 'corp.', 'st.', 'ave.', 'blvd.', - } - - -def contains_url(text: str) -> bool: - """ - Check if text contains a URL. - - Detects: - - http:// and https:// URLs - - www. URLs (with word boundaries) - - mailto: links - - Args: - text: Text to check - - Returns: - True if text contains a URL, False otherwise - """ - url_patterns = [ - r'https?://', # http:// or https:// - r'\bwww\.', # www. with word boundary - r'mailto:', # mailto: links - ] - for pattern in url_patterns: - if re.search(pattern, text, re.IGNORECASE): - return True - return False - - -def count_sentences(text: str) -> int: - """ - Count sentences in text, handling edge cases. - - Splits on sentence-ending punctuation (., !, ?) followed by space/newline. - Handles edge cases: - - Abbreviations: using get_common_abbreviations() (normalize both to lowercase for comparison) - - Decimals: \\d+\\.\\d+ pattern - - URLs: using contains_url function - - Ellipses: ... and Unicode ellipsis (…) - - Hybrid approach: period + uppercase next char AND not in abbreviation list (case-insensitive) - - Args: - text: Text to count sentences in - - Returns: - Number of sentences (0 for empty/whitespace text, minimum 1 if text is non-empty) - """ - if not text or not text.strip(): - return 0 - - abbreviations = get_common_abbreviations() - text_lower = text.lower() - - # Check if text contains URLs - has_urls = contains_url(text) - - # Pattern for ellipses (not currently used but kept for potential future use) - # ellipsis_pattern = re.compile(r'\.\.\.|…') - - # Split text into potential sentences using regex - # Match sentence-ending punctuation followed by whitespace or end of string - - # Find all potential sentence endings - matches = list(_RE_SENTENCE_END_PATTERN.finditer(text)) - if not matches: - # No sentence-ending punctuation found, but text exists - return 1 if text.strip() else 0 - - sentences = [] - last_end = 0 - - for match in matches: - punct_pos = match.start() - punct_end = match.end() - # Adjust punct_end to include the whitespace if present - if punct_end < len(text) and text[punct_end].isspace(): - # Skip whitespace - while punct_end < len(text) and text[punct_end].isspace(): - punct_end += 1 - - # Check if this period/exclamation/question mark is part of an ellipsis - if punct_pos > 0 and punct_pos + 2 < len(text): - if text[punct_pos - 1:punct_pos + 2] == '...' or text[punct_pos:punct_pos + 3] == '...': - continue - if punct_pos + 1 < len(text) and text[punct_pos:punct_pos + 2] == '..': - continue - - # Check if this is part of a decimal number - context_start = max(0, punct_pos - 10) - context_end = min(len(text), punct_pos + 10) - context = text[context_start:context_end] - if _RE_DECIMAL_PATTERN.search(context): - continue - - # Check if this is part of a URL - if has_urls: - url_context_start = max(0, punct_pos - 30) - url_context_end = min(len(text), punct_pos + 30) - url_context = text[url_context_start:url_context_end] - if contains_url(url_context): - continue - - # Check if this is an abbreviation - # Look backwards to find the word before the punctuation - word_start = punct_pos - while word_start > 0 and (text[word_start - 1].isalnum() or text[word_start - 1] == '.'): - word_start -= 1 - - word_before = text_lower[word_start:punct_pos + 1] - if word_before in abbreviations: - continue - - # Check hybrid approach: if period and next char is uppercase, likely sentence end - if text[punct_pos] == '.' and punct_end < len(text): - # Find next non-whitespace character - next_char_pos = punct_end - while next_char_pos < len(text) and text[next_char_pos].isspace(): - next_char_pos += 1 - if next_char_pos < len(text): - next_char = text[next_char_pos] - # If uppercase and not an abbreviation, it's a sentence end - if next_char.isupper() and word_before not in abbreviations: - # This is a sentence end - sentence = text[last_end:punct_end].strip() - if sentence: - sentences.append(sentence) - last_end = punct_end - continue - - # Regular sentence ending (followed by whitespace) - sentence = text[last_end:punct_end].strip() - if sentence: - sentences.append(sentence) - last_end = punct_end - - # Add remaining text as a sentence if any - if last_end < len(text): - remaining = text[last_end:].strip() - if remaining: - sentences.append(remaining) - - # Filter out empty sentences - sentences = [s for s in sentences if s] - - return len(sentences) if sentences else (1 if text.strip() else 0) - - -def has_code_blocks(content: str, exclude_languages: Optional[Set[str]] = None) -> bool: - """ - Check if content contains code blocks (any language, excluding specified). - - Extracts first word from language identifier by splitting on any non-alpha character. - Examples: "go example" -> "go", "rust,no_run" -> "rust", "c++" -> "c" - - Args: - content: Markdown content to check - exclude_languages: Optional set of language identifiers to exclude - (e.g., {'text', 'markdown'}) - - Returns: - True if content contains code blocks (excluding specified languages) - """ - lines = content.split('\n') - in_code_block = False - code_block_language = None - - for line in lines: - stripped = line.strip() - if stripped.startswith('```'): - if in_code_block: - # Closing code block - in_code_block = False - code_block_language = None - else: - # Opening code block - in_code_block = True - # Extract language identifier - language_part = stripped[3:].strip() - if language_part: - # Split on any non-alpha character to get first token - match = re.match(r'^([a-zA-Z]+)', language_part) - if match: - code_block_language = match.group(1).lower() - else: - code_block_language = None - else: - code_block_language = None - - # Check if this language should be excluded - if exclude_languages and code_block_language: - if code_block_language in exclude_languages: - # Skip this code block - continue - - # Found a code block that's not excluded - return True - - return False - - -def build_heading_hierarchy( - headings: List[Tuple[int, int, str]] # (line_num, level, text) -) -> Dict[int, Optional[int]]: - """ - Build parent-child relationship mapping for headings. - - Uses heading_stack approach similar to validate_heading_numbering.py. - Each heading finds its most recent parent at the appropriate level. - - Args: - headings: List of (line_num, level, text) tuples, sorted by line_num - - Returns: - Dict mapping heading index (0-based) -> parent heading index (None if no parent). - If H3+ appears before H2, it has no parent (None). - """ - hierarchy = {} - heading_stack = {} # Maps level -> heading_index (current parent at that level) - - for idx, (line_num, level, text) in enumerate(headings): - parent_index = None - if level > 2: - # H3 and beyond need a parent - parent_level = level - 1 - if parent_level in heading_stack: - parent_index = heading_stack[parent_level] - - hierarchy[idx] = parent_index - - # Update heading stack - set this heading as the current parent at its level - heading_stack[level] = idx - - # Clear deeper levels when we move up in hierarchy - levels_to_clear = [lvl for lvl in heading_stack.keys() if lvl > level] - for lvl in levels_to_clear: - del heading_stack[lvl] - - return hierarchy - - -def get_subheadings( - heading_index: int, - heading_level: int, - all_headings: List[Tuple[int, int, str]], - hierarchy: Dict[int, Optional[int]] -) -> List[int]: - """ - Get all subheadings (all descendants at any level > heading_level) for a given heading. - - Args: - heading_index: Index of the heading in all_headings list (0-based) - heading_level: Level of the heading - all_headings: List of (line_num, level, text) tuples - hierarchy: Parent-child mapping from build_heading_hierarchy - - Returns: - List of indices (0-based) for all subheadings (all descendants at any level > heading_level) - """ - subheadings = [] - - # Find all headings that are descendants of this heading - # A heading is a descendant if it has this heading in its ancestor chain - def is_descendant(child_idx: int) -> bool: - current = child_idx - while current is not None and current in hierarchy: - parent = hierarchy[current] - if parent == heading_index: - return True - current = parent - return False - - for idx, (line_num, level, text) in enumerate(all_headings): - if idx != heading_index and level > heading_level: - if is_descendant(idx): - subheadings.append(idx) - - return subheadings - - -def is_organizational_heading( - content: str, - heading_line: int, - heading_level: int, - all_headings: List[Tuple[int, int, str]], - hierarchy: Dict[int, Optional[int]], - max_prose_lines: int = 5 -) -> dict: - """ - Determine if a heading is purely organizational (grouping only). - - A heading is organizational if: - - Has no code blocks (any language, except text/markdown) - - Has max_prose_lines or fewer sentences - - Only contains subheadings with no substantive content - - Args: - content: Full markdown content - heading_line: Line number of the heading (1-indexed) - heading_level: Level of the heading (2-6) - all_headings: List of (line_num, level, text) tuples - hierarchy: Parent-child mapping from build_heading_hierarchy - max_prose_lines: Maximum sentences before considered non-organizational - - Returns: - Dict with: - - is_organizational: bool - True if heading is organizational - - is_empty: bool - True if heading has no content (0 sentences), - False if it has minor informative content (1-5 sentences) - - sentence_count: int - Number of sentences in the section - """ - # Find the heading index - heading_index = None - for idx, (line_num, level, text) in enumerate(all_headings): - if line_num == heading_line and level == heading_level: - heading_index = idx - break - - if heading_index is None: - return {'is_organizational': False, 'is_empty': False, 'sentence_count': 0} - - # Find next heading (any level) to determine section boundaries - next_heading_line = None - for line_num, level, text in all_headings: - if line_num > heading_line: - next_heading_line = line_num - break - - # Extract section content - lines = content.split('\n') - if next_heading_line: - section_lines = lines[heading_line - 1:next_heading_line - 1] - else: - section_lines = lines[heading_line - 1:] - - section_content = '\n'.join(section_lines) - - # Extract prose (non-heading, non-code-block lines) - prose_lines = [] - in_code_block = False - for line in section_lines[1:]: # Skip the heading line itself - stripped = line.strip() - if stripped.startswith('```'): - in_code_block = not in_code_block - continue - if in_code_block: - continue - # Check if it's a heading - if re.match(r'^#{1,6}\s+', stripped): - continue - if stripped: - prose_lines.append(line) - - prose_text = '\n'.join(prose_lines) - - # Count sentences - sentence_count = count_sentences(prose_text) - - # Check for code blocks (excluding text/markdown) - if has_code_blocks(section_content, exclude_languages={'text', 'markdown'}): - return {'is_organizational': False, 'is_empty': False, 'sentence_count': sentence_count} - - # Get subheadings - subheadings = get_subheadings(heading_index, heading_level, all_headings, hierarchy) - - # Return organizational if: no code blocks AND sentence_count <= max_prose_lines AND - # (sentence_count == 0 OR only subheadings) - if sentence_count <= max_prose_lines: - if sentence_count == 0 or (subheadings and len(prose_lines) <= max_prose_lines): - return { - 'is_organizational': True, - 'is_empty': (sentence_count == 0), - 'sentence_count': sentence_count - } - - return {'is_organizational': False, 'is_empty': False, 'sentence_count': sentence_count} - - -def generate_anchor_from_heading(heading: str, include_hash: bool = False) -> str: - """ - Generate a GitHub-style markdown anchor from heading text. - - This function implements GitHub's markdown anchor generation algorithm: - - Removes backticks but preserves their content (e.g., `` `code` `` -> `code`) - - Converts to lowercase - - Removes special characters except word characters, spaces, and hyphens - - Collapses sequences of spaces and hyphens into a single hyphen - - Strips leading and trailing hyphens - - Args: - heading: The heading text (may contain markdown formatting like backticks) - include_hash: If True, prefix the anchor with '#' (default: False) - - Returns: - The generated anchor string (with '#' prefix if include_hash=True) - - Examples: - >>> generate_anchor_from_heading("1.2.3 AddFile Package Method") - '123-addfile-package-method' - >>> generate_anchor_from_heading("File Management with `Package` type") - 'file-management-with-package-type' - >>> generate_anchor_from_heading("Heading - With Multiple Spaces") - 'heading-with-multiple-spaces' - """ - if not heading: - return "" - - # Remove markdown formatting (backticks) but preserve their content - # This matches GitHub's behavior: `` `code` `` becomes `code` in the anchor - heading_clean = re.sub(r'`([^`]+)`', r'\1', heading) - - # Convert to lowercase - heading_lower = heading_clean.lower() - - # Preserve " - " (space-hyphen-space) as "---" to match GitHub/markdownlint MD051 - _placeholder = 'TRPLDASH' - heading_lower = heading_lower.replace(' - ', _placeholder) - - # Remove special characters except word characters, spaces, and hyphens - anchor = re.sub(r'[^\w\s-]', '', heading_lower) - - # Collapse sequences of spaces and hyphens into a single hyphen - anchor = re.sub(r'[-\s]+', '-', anchor) - - # Restore "---" for " - " so slug matches GitHub/markdownlint - anchor = anchor.replace(_placeholder, '---') - - # Strip leading and trailing hyphens - anchor = anchor.strip('-') - - # Add '#' prefix if requested - if include_hash: - return '#' + anchor if anchor else "" - return anchor - - -def remove_backticks_keep_content(text: str) -> str: - """ - Remove backticks from text but keep their contents. - - This removes the backtick characters but preserves the text that was - enclosed in backticks. This is the standard behavior for both validation scripts. - - Args: - text: Text that may contain backticks - - Returns: - Text with backticks removed but content preserved - - Examples: - "Heading with `code` example" => "Heading with code example" - "`func()` and `var`" => "func() and var" - "`code`" => "code" - "No backticks here" => "No backticks here" - """ - if not text: - return text - - # Remove backticks but keep content - # Pattern matches backtick, captures content, matches closing backtick - result = re.sub(r'`([^`]*)`', r'\1', text) - return result - - -def has_backticks(text: str) -> bool: - """ - Check if text contains backticks. - - Args: - text: Text to check for backticks - - Returns: - True if text contains backticks, False otherwise - - Examples: - "Heading with `code`" => True - "Plain text heading" => False - "" => False - None => False - """ - if not text: - return False - return '`' in text - - -def get_backticks_error_message() -> str: - """ - Get the standard error message for backticks in headings. - - Returns: - Standard error message string for backticks in headings - """ - return ("Heading contains backticks. " - "Headings should not contain backticks; use plain text instead.") - - -def is_safe_path(file_path: Path, repo_root: Path) -> bool: - """ - Check if a path is safe (within repo and no traversal). - - Args: - file_path: Path to check - repo_root: Repository root directory - - Returns: - True if path is safe (within repo root), False otherwise - """ - try: - # Resolve to absolute path and check it's within repo - resolved = file_path.resolve() - repo_resolved = repo_root.resolve() - # Check that resolved path is within repo root - return str(resolved).startswith(str(repo_resolved)) - except (OSError, ValueError): - return False - - -def validate_file_name(filename: str) -> bool: - """ - Validate that filename is safe (no path traversal, no separators). - - Args: - filename: Filename to validate - - Returns: - True if filename is safe, False otherwise - """ - if not filename: - return False - # No path separators allowed - if '/' in filename or '\\' in filename: - return False - # No parent directory references - if '..' in filename: - return False - # No null bytes - if '\x00' in filename: - return False - return True - - -def validate_spec_file_name(spec_file: str) -> bool: - """ - Validate that spec file name is safe (no path traversal, no separators, .md extension). - - Args: - spec_file: Spec file name to validate - - Returns: - True if spec file name is safe, False otherwise - """ - if not spec_file: - return False - # Must be a simple filename with .md extension - # No path separators allowed - if '/' in spec_file or '\\' in spec_file: - return False - # No parent directory references - if '..' in spec_file: - return False - # Must end with .md - if not spec_file.endswith('.md'): - return False - # Must be a valid filename (alphanumeric, underscore, hyphen, dot) - if not re.match(r'^[a-zA-Z0-9_\-]+\.md$', spec_file): - return False - return True - - -def validate_anchor(anchor: str) -> bool: - """ - Validate that anchor is safe (no path traversal, no separators). - - Args: - anchor: Anchor string to validate - - Returns: - True if anchor is safe, False otherwise - """ - if not anchor: - return True # Empty anchor is OK - # No path separators allowed - if '/' in anchor or '\\' in anchor: - return False - # No parent directory references - if '..' in anchor: - return False - # No null bytes - if '\x00' in anchor: - return False - # Anchor should only contain alphanumeric, hyphens, underscores - if not re.match(r'^[a-zA-Z0-9_\-]+$', anchor): - return False - return True - - -class FileContentCache: - """ - Cache for file contents to avoid repeated reads. - - This class provides efficient caching of file contents to reduce I/O overhead - when the same files are read multiple times during validation. - """ - - def __init__(self): - """Initialize an empty cache.""" - self._cache: Dict[Path, str] = {} - self._lines_cache: Dict[Path, List[str]] = {} - - def get_content(self, file_path: Path) -> str: - """ - Get file content, using cache if available. - - Args: - file_path: Path to the file to read - - Returns: - File content as string - - Raises: - IOError: If file cannot be read - """ - if file_path not in self._cache: - self._cache[file_path] = file_path.read_text(encoding='utf-8') - return self._cache[file_path] - - def get_lines(self, file_path: Path) -> List[str]: - """ - Get file content as list of lines, using cache if available. - - Args: - file_path: Path to the file to read - - Returns: - File content as list of lines (without newline characters) - - Raises: - IOError: If file cannot be read - """ - if file_path not in self._lines_cache: - content = self.get_content(file_path) - self._lines_cache[file_path] = content.split('\n') - return self._lines_cache[file_path] - - def clear(self): - """Clear all cached content.""" - self._cache.clear() - self._lines_cache.clear() - - def has(self, file_path: Path) -> bool: - """ - Check if file content is cached. - - Args: - file_path: Path to check - - Returns: - True if file is cached, False otherwise - """ - return file_path in self._cache - - -class ValidationIssue: - """ - Represents a validation issue found in markdown files. - - This is a shared class used across validation scripts for consistency. - Issues are tracked as List[ValidationIssue] in validation functions. - """ - - def __init__( - self, - issue_type: str, - file_path: Path, - start_line: int, - end_line: int, - message: str, - severity: str = "error", # "error" or "warning" - suggestion: Optional[str] = None, - heading: Optional[str] = None, - **kwargs - ): - """ - Create a ValidationIssue. - - Args: - issue_type: Type of issue (e.g., 'missing_comment', 'heading_format') - file_path: Path to the file (will be converted to string) - start_line: Starting line number - end_line: Ending line number - message: Issue message - severity: "error" or "warning" (default: "error") - suggestion: Optional suggestion for fixing - heading: Optional heading text - **kwargs: Additional type-specific fields (e.g., def_name, def_kind, etc.) - """ - self.issue_type = issue_type - self.file = str(file_path) # Convert Path to string - self.start_line = start_line - self.end_line = end_line - self.message = message - self.severity = severity.lower() # Normalize to lowercase - if self.severity not in ('error', 'warning'): - raise ValueError(f"severity must be 'error' or 'warning', got '{severity}'") - self.suggestion = suggestion - self.heading = heading - self.extra_fields = kwargs # Store additional fields - - def to_dict(self) -> Dict: - """Convert to dictionary for backward compatibility (JSON, reporting, etc.).""" - result = { - 'type': self.issue_type, - 'file': self.file, - 'start_line': self.start_line, - 'end_line': self.end_line, - 'message': self.message, - 'severity': self.severity, - } - if self.suggestion: - result['suggestion'] = self.suggestion - if self.heading: - result['heading'] = self.heading - result.update(self.extra_fields) - return result - - def format_message(self, no_color: bool = False) -> str: - """Format issue message using format_issue_message utility.""" - return format_issue_message( - self.severity, - self.issue_type, - self.file, - self.start_line, - self.message, - self.suggestion, - no_color - ) - - def matches( - self, - issue_type: Optional[str] = None, - severity: Optional[str] = None - ) -> bool: - """ - Check if this issue matches the given filter criteria. - - Args: - issue_type: Optional issue type to match (exact match) - severity: Optional severity to match (exact match, case-insensitive) - - Returns: - True if the issue matches all provided criteria, False otherwise. - If no criteria are provided, returns True. - """ - if issue_type is not None and self.issue_type != issue_type: - return False - if severity is not None and self.severity != severity.lower(): - return False - return True - - def __repr__(self) -> str: - """String representation for debugging.""" - return ( - f"ValidationIssue(type={self.issue_type!r}, file={self.file!r}, " - f"line={self.start_line}, severity={self.severity!r})" - ) +Facade: re-exports from lib.validation for backward compatibility. +Implementation lives in scripts/lib/validation/ (_core, _output, _fs, _markdown). +""" - def __eq__(self, other) -> bool: - """Equality comparison.""" - if not isinstance(other, ValidationIssue): - return False - return ( - self.issue_type == other.issue_type - and self.file == other.file - and self.start_line == other.start_line - and self.end_line == other.end_line - and self.message == other.message - and self.severity == other.severity - ) +from lib.validation import ( + DOCS_DIR, + FEATURES_DIR, + REQUIREMENTS_DIR, + TECH_SPECS_DIR, + COLOR_GREEN, + COLOR_RED, + COLOR_RESET, + COLOR_YELLOW, + SEPARATOR_WIDTH, + OutputBuilder, + ValidationIssue, + calculate_label_width, + colorize, + format_issue_message, + format_summary_line, + parse_no_color_flag, + supports_color, + FileContentCache, + find_feature_files, + find_markdown_files, + is_in_dot_directory, + HeadingContext, + ProseSection, + build_heading_hierarchy, + contains_url, + count_sentences, + extract_headings, + extract_headings_from_file, + extract_headings_with_anchors, + extract_h2_plus_headings_with_sections, + extract_headings_with_section_numbers, + find_heading_before_line, + find_heading_for_code_block, + generate_anchor_from_heading, + get_backticks_error_message, + get_common_abbreviations, + get_subheadings, + has_backticks, + has_code_blocks, + is_organizational_heading, + is_safe_path, + remove_backticks_keep_content, + validate_anchor, + validate_file_name, + validate_spec_file_name, + get_validation_exit_code, + get_workspace_root, + import_module_with_fallback, + parse_paths, +) + +__all__ = [ + "DOCS_DIR", + "FEATURES_DIR", + "REQUIREMENTS_DIR", + "TECH_SPECS_DIR", + "COLOR_GREEN", + "COLOR_RED", + "COLOR_RESET", + "COLOR_YELLOW", + "SEPARATOR_WIDTH", + "OutputBuilder", + "ValidationIssue", + "calculate_label_width", + "colorize", + "format_issue_message", + "format_summary_line", + "parse_no_color_flag", + "supports_color", + "FileContentCache", + "find_feature_files", + "find_markdown_files", + "is_in_dot_directory", + "HeadingContext", + "ProseSection", + "build_heading_hierarchy", + "contains_url", + "count_sentences", + "extract_headings", + "extract_headings_from_file", + "extract_headings_with_anchors", + "extract_h2_plus_headings_with_sections", + "extract_headings_with_section_numbers", + "find_heading_before_line", + "find_heading_for_code_block", + "generate_anchor_from_heading", + "get_backticks_error_message", + "get_common_abbreviations", + "get_subheadings", + "has_backticks", + "has_code_blocks", + "is_organizational_heading", + "is_safe_path", + "remove_backticks_keep_content", + "validate_anchor", + "validate_file_name", + "validate_spec_file_name", + "get_validation_exit_code", + "get_workspace_root", + "import_module_with_fallback", + "parse_paths", +] diff --git a/scripts/lib/audit_requirements/_audit_requirements_scan.py b/scripts/lib/audit_requirements/_audit_requirements_scan.py index e40baa60..b24aa869 100644 --- a/scripts/lib/audit_requirements/_audit_requirements_scan.py +++ b/scripts/lib/audit_requirements/_audit_requirements_scan.py @@ -328,12 +328,12 @@ def check_anchor_in_text_missing_href_anchor( if req_id: extra_fields['requirement_id'] = req_id - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "anchor_in_text_missing_href_anchor", Path(rel_path), line_num, line_num, - ( + message=( f"{prefix}link text includes '{spec_basename}#{anchor}' " f"but href has no '#{anchor}'" ), @@ -390,12 +390,12 @@ def check_requirement_tech_spec_link_thresholds( except ValueError: rel_path = req_file - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "too_many_tech_spec_links", Path(rel_path), start_line, start_line, - ( + message=( f"{req_id}: has {count} tech spec link(s) " f"(warning >= {warn_threshold}, error >= {error_threshold})" ), diff --git a/scripts/lib/go_defs_index/_go_defs_index_anchors.py b/scripts/lib/go_defs_index/_go_defs_index_anchors.py index 0c995377..72f6688c 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_anchors.py +++ b/scripts/lib/go_defs_index/_go_defs_index_anchors.py @@ -98,9 +98,10 @@ def find_section_for_definition( def check_missing_section_anchors( index_content: str, entry_to_target_md: Dict[str, str], - entry_to_link_anchor: Dict[str, Optional[str]], + _entry_to_link_anchor: Dict[str, Optional[str]], all_definitions: Dict[str, List[Tuple[str, int]]], tech_specs_dir: Path, + *, index_filename: str = "api_go_defs_index.md", file_cache: Optional[FileContentCache] = None, ) -> List[ValidationIssue]: @@ -171,12 +172,12 @@ def check_missing_section_anchors( current_link = link_match.group(0) index_file_path = tech_specs_dir / index_filename issues.append( - ValidationIssue( + ValidationIssue.create( "incorrect_section_anchor", index_file_path, line_num, line_num, - f"Index entry '{entry_name}' has incorrect or missing section anchor", + message=f"Index entry '{entry_name}' has incorrect or missing section anchor", severity="error", suggestion=suggested_link, entry_name=entry_name, @@ -189,12 +190,319 @@ def check_missing_section_anchors( return issues +def _scan_block_for_definition( + lines_content: List[str], + anchor_section_line: int, + next_heading_line: int, + code_block_start_line: int, + entry_name: str, +) -> tuple: + """Scan section for Go block and definition name. Returns (found, in_block, end_line).""" + definition_found_in_block = False + in_target_block = False + code_block_end: Optional[int] = None + end_row = min(next_heading_line - 1, len(lines_content)) + for i in range(anchor_section_line - 1, end_row): + line_text = lines_content[i] + current_line_num = i + 1 + if current_line_num == code_block_start_line: + in_target_block = line_text.strip() == "```go" + continue + if in_target_block: + if line_text.strip() == "```": + code_block_end = current_line_num + break + if "." in entry_name: + _receiver, method = entry_name.split(".", 1) + if (entry_name in line_text) or re.search( + rf"\\b{re.escape(method)}\\s*\\(", line_text + ): + definition_found_in_block = True + else: + if re.search(rf"\\b{re.escape(entry_name)}\\b", line_text): + definition_found_in_block = True + return (definition_found_in_block, in_target_block, code_block_end) + + +def _find_anchor_section_line(headings, anchor: str) -> Optional[int]: + """Return heading line number for matching anchor, or None.""" + for ( + _heading_level, + _heading_text, + heading_line, + heading_anchor, + section_anchor, + ) in headings: + if section_anchor and section_anchor == anchor: + return heading_line + if heading_anchor == anchor: + return heading_line + return None + + +def _validate_entry_and_get_definition_line( + entry_name: str, + target_file: Optional[str], + anchor: Optional[str], + all_definitions: Dict[str, List[Tuple[str, int]]], + index_file_path: Path, + *, + line_num: int, +) -> Tuple[Optional[ValidationIssue], Optional[int]]: + """Return (issue, None) if invalid, else (None, definition_line_in_target).""" + if not target_file or not anchor: + return (None, None) + if not validate_file_name(target_file) or not validate_anchor(anchor): + return (None, None) + if entry_name not in all_definitions: + return ( + ValidationIssue.create( + "definition_not_found", + index_file_path, + line_num, + line_num, + message=f"Definition '{entry_name}' not found in any tech spec file", + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + ), + None, + ) + definition_line_in_target = None + for filename, def_line in all_definitions[entry_name]: + if filename == target_file: + definition_line_in_target = def_line + break + if definition_line_in_target is None: + return ( + ValidationIssue.create( + "definition_not_in_target", + index_file_path, + line_num, + line_num, + message=f"Definition '{entry_name}' not found in target file '{target_file}'", + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + ), + None, + ) + return (None, definition_line_in_target) + + +def _issue_if_target_missing( + target_path: Path, + repo_root: Path, + index_file_path: Path, + line_num: int, + entry_name: str, + *, + target_file: str, + anchor: str, +): + """Return a single ValidationIssue if target path is unsafe or missing, else None.""" + if not is_safe_path(target_path, repo_root): + return None + if target_path.exists(): + return None + return ValidationIssue.create( + "target_file_not_found", + index_file_path, + line_num, + line_num, + message=f"Target file '{target_file}' does not exist", + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + ) + + +def _issue_if_anchor_or_position_bad( + lines_content: List[str], + headings: list, + anchor: str, + definition_line_in_target: int, + index_file_path: Path, + *, + line_num: int, + entry_name: str, + target_file: str, +) -> Tuple[Optional[ValidationIssue], Optional[int], Optional[int]]: + """Return (issue, None, None) if bad, else (None, anchor_section_line, next_heading_line).""" + anchor_section_line = _find_anchor_section_line(headings, anchor) + if anchor_section_line is None: + return ( + ValidationIssue.create( + "anchor_no_match", + index_file_path, + line_num, + line_num, + message=f"Anchor '{anchor}' does not match any heading in target file", + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + ), + None, + None, + ) + next_heading_line = len(lines_content) + 1 + for _level, _text, h_line, _a, _sa in headings: + if h_line > anchor_section_line: + next_heading_line = h_line + break + if definition_line_in_target < anchor_section_line: + return ( + ValidationIssue.create( + "definition_before_anchor", + index_file_path, + line_num, + line_num, + message=( + f"Definition at line {definition_line_in_target} is before anchor " + f"section at line {anchor_section_line}" + ), + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + definition_line=definition_line_in_target, + anchor_line=anchor_section_line, + ), + None, + None, + ) + if definition_line_in_target >= next_heading_line: + return ( + ValidationIssue.create( + "definition_after_anchor", + index_file_path, + line_num, + line_num, + message=( + f"Definition at line {definition_line_in_target} is after anchor " + f"section (next heading at line {next_heading_line})" + ), + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + definition_line=definition_line_in_target, + next_heading_line=next_heading_line, + ), + None, + None, + ) + code_block_start_line = definition_line_in_target + if ( + code_block_start_line < anchor_section_line + or code_block_start_line >= next_heading_line + ): + return ( + ValidationIssue.create( + "code_block_outside_section", + index_file_path, + line_num, + line_num, + message=( + f"Definition code block at line {code_block_start_line} is not " + f"within anchor section (lines {anchor_section_line}-" + f"{next_heading_line - 1})" + ), + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + code_block_line=code_block_start_line, + section_start=anchor_section_line, + section_end=next_heading_line - 1, + ), + None, + None, + ) + return (None, anchor_section_line, next_heading_line) + + +def _issue_if_block_invalid( + lines_content: List[str], + anchor_section_line: int, + next_heading_line: int, + code_block_start_line: int, + entry_name: str, + *, + index_file_path: Path, + line_num: int, + target_file: str, + anchor: str, +): + """Return one ValidationIssue if block invalid/not go/unterminated/missing def, else None.""" + definition_found_in_block, in_target_block, code_block_end = _scan_block_for_definition( + lines_content, + anchor_section_line, + next_heading_line, + code_block_start_line, + entry_name, + ) + if not in_target_block: + return ValidationIssue.create( + "code_block_not_go", + index_file_path, + line_num, + line_num, + message=( + f"Definition code block at line {code_block_start_line} is not a " + "Go code block (expected ```go)" + ), + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + code_block_line=code_block_start_line, + ) + if code_block_end is None: + return ValidationIssue.create( + "code_block_unterminated", + index_file_path, + line_num, + line_num, + message=( + f"Go code block starting at line {code_block_start_line} is not closed" + ), + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + code_block_line=code_block_start_line, + ) + if not definition_found_in_block: + return ValidationIssue.create( + "definition_not_in_block", + index_file_path, + line_num, + line_num, + message=( + f"Definition '{entry_name}' not found in code block at line " + f"{code_block_start_line} within anchor section" + ), + severity="error", + entry_name=entry_name, + target_file=target_file, + anchor=anchor, + code_block_line=code_block_start_line, + ) + return None + + def check_anchor_points_to_definition( index_content: str, entry_to_target_md: Dict[str, str], entry_to_link_anchor: Dict[str, Optional[str]], all_definitions: Dict[str, List[Tuple[str, int]]], tech_specs_dir: Path, + *, index_filename: str = "api_go_defs_index.md", file_cache: Optional[FileContentCache] = None, ) -> List[ValidationIssue]: @@ -206,306 +514,73 @@ def check_anchor_points_to_definition( index_file_path = tech_specs_dir / index_filename lines = index_content.split("\n") entry_pattern = r"^\\s*-\\s+\\*\\*`([^`]+)`\\*\\*" + repo_root = tech_specs_dir.parent.parent for line_num, line in enumerate(lines, 1): entry_match = re.match(entry_pattern, line) if not entry_match: continue - entry_name = normalize_generic_name(entry_match.group(1)) target_file = entry_to_target_md.get(entry_name) - if not target_file: - continue - anchor = entry_to_link_anchor.get(entry_name) - if not anchor: - continue - - if not validate_file_name(target_file): - continue - if not validate_anchor(anchor): - continue - - if entry_name not in all_definitions: - issues.append( - ValidationIssue( - "definition_not_found", - index_file_path, - line_num, - line_num, - f"Definition '{entry_name}' not found in any tech spec file", - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - ) - ) + entry_issue, definition_line_in_target = _validate_entry_and_get_definition_line( + entry_name, target_file, anchor, all_definitions, + index_file_path, line_num=line_num, + ) + if entry_issue is not None: + issues.append(entry_issue) continue - - definition_found_in_target = False - definition_line_in_target: Optional[int] = None - for filename, def_line in all_definitions[entry_name]: - if filename == target_file: - definition_found_in_target = True - definition_line_in_target = def_line - break - - if not definition_found_in_target: - issues.append( - ValidationIssue( - "definition_not_in_target", - index_file_path, - line_num, - line_num, - f"Definition '{entry_name}' not found in target file '{target_file}'", - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - ) - ) + if definition_line_in_target is None: continue - target_path = tech_specs_dir / target_file - repo_root = tech_specs_dir.parent.parent - if not is_safe_path(target_path, repo_root): - continue - - if not target_path.exists(): - issues.append( - ValidationIssue( - "target_file_not_found", - index_file_path, - line_num, - line_num, - f"Target file '{target_file}' does not exist", - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - ) - ) + target_issue = _issue_if_target_missing( + target_path, repo_root, index_file_path, line_num, + entry_name, target_file=target_file, anchor=anchor, + ) + if target_issue is not None: + issues.append(target_issue) continue - try: lines_content = content_cache.get_lines(target_path) - headings = extract_h2_plus_headings_with_sections( target_path, skip_code_blocks=True, file_cache=file_cache ) - - # Find heading matching the anchor (either section_anchor or plain anchor). - anchor_section_line: Optional[int] = None - for ( - heading_level, - heading_text, - heading_line, - heading_anchor, - section_anchor, - ) in headings: - if section_anchor and section_anchor == anchor: - anchor_section_line = heading_line - break - if heading_anchor == anchor: - anchor_section_line = heading_line - break - - if anchor_section_line is None: - issues.append( - ValidationIssue( - "anchor_no_match", - index_file_path, - line_num, - line_num, - f"Anchor '{anchor}' does not match any heading in target file", - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - ) - ) - continue - - # Find the next heading after the anchor heading (or end of file). - next_heading_line = len(lines_content) + 1 - for _level, _text, h_line, _a, _sa in headings: - if h_line > anchor_section_line: - next_heading_line = h_line - break - - assert definition_line_in_target is not None - - if definition_line_in_target < anchor_section_line: - issues.append( - ValidationIssue( - "definition_before_anchor", - index_file_path, - line_num, - line_num, - ( - f"Definition at line {definition_line_in_target} is before anchor " - f"section at line {anchor_section_line}" - ), - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - definition_line=definition_line_in_target, - anchor_line=anchor_section_line, - ) - ) - continue - - if definition_line_in_target >= next_heading_line: - issues.append( - ValidationIssue( - "definition_after_anchor", - index_file_path, - line_num, - line_num, - ( - f"Definition at line {definition_line_in_target} is after anchor " - f"section (next heading at line {next_heading_line})" - ), - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - definition_line=definition_line_in_target, - next_heading_line=next_heading_line, - ) - ) - continue - - # Verify the definition code block is within the section. - code_block_start_line = definition_line_in_target - if ( - code_block_start_line < anchor_section_line - or code_block_start_line >= next_heading_line - ): - issues.append( - ValidationIssue( - "code_block_outside_section", - index_file_path, - line_num, - line_num, - ( - f"Definition code block at line {code_block_start_line} is not " - f"within anchor section (lines {anchor_section_line}-" - f"{next_heading_line - 1})" - ), - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - code_block_line=code_block_start_line, - section_start=anchor_section_line, - section_end=next_heading_line - 1, - ) - ) - continue - - # Verify the definition name actually appears in that Go code block. - definition_found_in_block = False - in_target_block = False - code_block_end: Optional[int] = None - - for i in range( - anchor_section_line - 1, min(next_heading_line - 1, len(lines_content)) - ): - line_text = lines_content[i] - current_line_num = i + 1 - - if current_line_num == code_block_start_line: - if line_text.strip() == "```go": - in_target_block = True - else: - # Defensive: this should not happen if all_definitions came from Go blocks. - in_target_block = False - continue - - if in_target_block: - if line_text.strip() == "```": - code_block_end = current_line_num - break - - # Check for definition name in the block. - # For methods, accept either "Type.Method" or "Method(" occurrence. - if "." in entry_name: - receiver, method = entry_name.split(".", 1) - if (entry_name in line_text) or re.search( - rf"\\b{re.escape(method)}\\s*\\(", - line_text, - ): - definition_found_in_block = True - else: - if re.search(rf"\\b{re.escape(entry_name)}\\b", line_text): - definition_found_in_block = True - - if not in_target_block: - issues.append( - ValidationIssue( - "code_block_not_go", - index_file_path, - line_num, - line_num, - ( - f"Definition code block at line {code_block_start_line} is not a " - "Go code block (expected ```go)" - ), - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - code_block_line=code_block_start_line, - ) - ) - continue - - if code_block_end is None: - issues.append( - ValidationIssue( - "code_block_unterminated", - index_file_path, - line_num, - line_num, - f"Go code block starting at line {code_block_start_line} is not closed", - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - code_block_line=code_block_start_line, - ) + pos_issue, anchor_section_line, next_heading_line = ( + _issue_if_anchor_or_position_bad( + lines_content, + headings, + anchor, + definition_line_in_target, + index_file_path, + line_num=line_num, + entry_name=entry_name, + target_file=target_file, ) + ) + if pos_issue is not None: + issues.append(pos_issue) continue - - if not definition_found_in_block: - issues.append( - ValidationIssue( - "definition_not_in_block", - index_file_path, - line_num, - line_num, - ( - f"Definition '{entry_name}' not found in code block at line " - f"{code_block_start_line} within anchor section" - ), - severity="error", - entry_name=entry_name, - target_file=target_file, - anchor=anchor, - code_block_line=code_block_start_line, - ) - ) - - except Exception as e: + block_issue = _issue_if_block_invalid( + lines_content, + anchor_section_line, + next_heading_line, + definition_line_in_target, + entry_name, + index_file_path=index_file_path, + line_num=line_num, + target_file=target_file, + anchor=anchor, + ) + if block_issue is not None: + issues.append(block_issue) + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: issues.append( - ValidationIssue( + ValidationIssue.create( "anchor_check_error", index_file_path, line_num, line_num, - f"Error checking anchor: {e}", + message=f"Error checking anchor: {e}", severity="error", entry_name=entry_name, target_file=target_file, diff --git a/scripts/lib/go_defs_index/_go_defs_index_config.py b/scripts/lib/go_defs_index/_go_defs_index_config.py index ab293f90..f4a3af96 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_config.py +++ b/scripts/lib/go_defs_index/_go_defs_index_config.py @@ -102,54 +102,68 @@ ], # Metadata and Tags Domain "tag": [ + ("Tag Methods", "strong"), + ("FileEntry Types", "medium"), ("FileEntry Helper Functions", "medium"), - ("Metadata Types", "medium"), - ("Metadata Methods", "strong"), - ("Metadata Helper Functions", "strong"), ], "fileentrytag": [ ("FileEntry Helper Functions", "strong"), - ("Metadata Methods", "medium"), - ("Metadata Helper Functions", "medium"), ], "pathmetadatatag": [ - ("Metadata Methods", "strong"), - ("Metadata Helper Functions", "strong"), + ("Package Path Metadata Methods", "strong"), + ("Package Metadata Type Methods", "strong"), + ("Package Metadata Helper Functions", "strong"), ], "metadata": [ - ("Metadata Types", "strong"), - ("Metadata Methods", "strong"), - ("Metadata Helper Functions", "strong"), - ("Package Metadata Methods", "medium"), + ("Package Metadata Types", "strong"), + ("Package Comment Methods", "strong"), + ("Package Identity Methods", "strong"), + ("Package Special File Methods", "strong"), + ("Package Path Metadata Methods", "strong"), + ("Package Symlink Methods", "strong"), + ("Package Metadata-Only Methods", "strong"), + ("Package Info Methods", "strong"), + ("Package Metadata Validation Methods", "strong"), + ("Package Metadata Internal Methods", "strong"), + ("Package Metadata Type Methods", "strong"), + ("Package Metadata Helper Functions", "strong"), ], "pathmetadata": [ - ("Metadata Types", "strong"), - ("Metadata Methods", "strong"), - ("Metadata Helper Functions", "strong"), + ("Package Metadata Types", "strong"), + ("Package Path Metadata Methods", "strong"), + ("Package Metadata Type Methods", "strong"), + ("Package Metadata Helper Functions", "strong"), ], "fileentry": [ - ("FileEntry", "strong"), - ("FileEntry Methods", "strong"), + ("FileEntry Types", "strong"), + ("FileEntry Query Methods", "strong"), + ("FileEntry Data Methods", "strong"), + ("FileEntry Temp File Methods", "strong"), + ("FileEntry Serialization Methods", "strong"), + ("FileEntry Path Methods", "strong"), + ("FileEntry Transformation Methods", "strong"), ("FileEntry Helper Functions", "strong"), ], "appid": [ - ("Package Metadata Methods", "medium"), + ("Package Identity Methods", "medium"), ], "vendorid": [ - ("Package Metadata Methods", "medium"), + ("Package Identity Methods", "medium"), ], "comment": [ - ("Package Metadata Methods", "strong"), - ("Metadata Types", "medium"), + ("Package Comment Methods", "strong"), + ("Package Metadata Type Methods", "strong"), + ("Package Metadata Types", "medium"), ], "packagecomment": [ - ("Package Metadata Methods", "strong"), - ("Metadata Types", "medium"), + ("Package Comment Methods", "strong"), + ("Package Metadata Type Methods", "strong"), + ("Package Metadata Types", "medium"), ], "validatecomment": [ ("Package Helper Functions", "strong"), - ("Package Metadata Methods", "medium"), - ("Package Metadata Methods", "medium"), + ("Package Metadata Helper Functions", "medium"), + ("Package Comment Methods", "medium"), ], "validatepathlength": [ ("Package Helper Functions", "strong"), @@ -214,14 +228,12 @@ ], # Deduplication Domain "deduplication": [ - ("Deduplication Types", "strong"), - ("Deduplication Methods", "strong"), - ("Deduplication Helper Functions", "strong"), + ("Package File Management Methods", "strong"), + ("Package Information and Queries Methods", "strong"), ], "dedup": [ - ("Deduplication Types", "medium"), - ("Deduplication Methods", "medium"), - ("Deduplication Helper Functions", "medium"), + ("Package File Management Methods", "medium"), + ("Package Information and Queries Methods", "medium"), ], # FileType System Domain "filetype": [ @@ -270,6 +282,9 @@ ] DOMAIN_FILE_MAP = { + # Deprecated: replaced by pattern-based detection in + # lib/go_defs_index/_go_defs_index_scoring_domain.py. + # Kept temporarily for reference until fully removed. "api_generics.md": "generic", "api_streaming.md": "streaming", "api_package_compression.md": "compression", diff --git a/scripts/lib/go_defs_index/_go_defs_index_descriptions.py b/scripts/lib/go_defs_index/_go_defs_index_descriptions.py index 9637ad3e..cb43c448 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_descriptions.py +++ b/scripts/lib/go_defs_index/_go_defs_index_descriptions.py @@ -69,12 +69,12 @@ def check_entry_descriptions( ) output.add_error_line( - ValidationIssue( + ValidationIssue.create( error_type, index_file, entry.line_number, entry.line_number, - error_msg_text, + message=error_msg_text, severity="error", suggestion=suggestion, ).format_message(no_color=output.no_color) @@ -92,12 +92,12 @@ def check_entry_descriptions( entries_list = ", ".join(f"`{name}`" for name in entry_names) first_entry = index_entries[entry_names[0]] output.add_error_line( - ValidationIssue( + ValidationIssue.create( "Duplicate description", index_file, first_entry.line_number, first_entry.line_number, - f"Multiple entries share the same description: {entries_list}", + message=f"Multiple entries share the same description: {entries_list}", severity="error", suggestion="Each entry should have a unique description", ).format_message(no_color=output.no_color) diff --git a/scripts/lib/go_defs_index/_go_defs_index_discovery.py b/scripts/lib/go_defs_index/_go_defs_index_discovery.py index 7aab79cb..a7176145 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_discovery.py +++ b/scripts/lib/go_defs_index/_go_defs_index_discovery.py @@ -2,22 +2,17 @@ from pathlib import Path import re -import sys from typing import Dict, List, Optional -scripts_dir = Path(__file__).resolve().parents[2] -if str(scripts_dir) not in sys.path: - sys.path.insert(0, str(scripts_dir)) - -from lib._go_code_utils import ( # noqa: E402 +from lib._go_code_utils import ( extract_go_doc_comment_above, find_go_code_blocks, is_example_code, normalize_generic_name, parse_go_def_signature, ) -from lib.go_defs_index._go_defs_index_models import DetectedDefinition # noqa: E402 -from lib._validation_utils import ( # noqa: E402 +from lib.go_defs_index._go_defs_index_models import DetectedDefinition +from lib._validation_utils import ( FileContentCache, OutputBuilder, ValidationIssue, @@ -27,10 +22,12 @@ find_heading_before_line, find_heading_for_code_block, ) -from lib.go_defs_index._go_defs_index_headings import resolve_canonical_reference # noqa: E402 +from lib.go_defs_index._go_defs_index_headings import resolve_canonical_reference INDEX_FILENAME = "api_go_defs_index.md" -_QUALIFIED_TYPE_RE = re.compile(r"\b([A-Za-z_][A-Za-z0-9_]*)\.([A-Za-z_][A-Za-z0-9_]*)\b") +_QUALIFIED_TYPE_RE = re.compile( + r"\b([A-Za-z_][A-Za-z0-9_]*)\.([A-Za-z_][A-Za-z0-9_]*)\b" +) _IDENT_RE = re.compile(r"[A-Za-z_][A-Za-z0-9_]*") @@ -108,6 +105,180 @@ def _extract_signature_types(sig) -> tuple[List[str], List[str], List[str], List return input_types, output_types, referenced_types, referenced_methods +def _build_duplicate_groups( + definitions_by_name: Dict[str, List[DetectedDefinition]], +) -> Dict[str, List[DetectedDefinition]]: + """Return groups of definitions with same name in multiple files.""" + duplicate_groups: Dict[str, List[DetectedDefinition]] = {} + for name, defns in definitions_by_name.items(): + if len(defns) > 1: + files = {d.file for d in defns} + if len(files) > 1: + duplicate_groups[name] = defns + return duplicate_groups + + +def _emit_duplicate_errors( + duplicate_groups: Dict[str, List[DetectedDefinition]], + output: OutputBuilder, + tech_specs_dir: Path, + content_cache: FileContentCache, +) -> None: + """Emit error lines for duplicate definition groups.""" + output.add_errors_header() + output.add_line( + f"Found {len(duplicate_groups)} duplicate definition(s) across multiple files:", + section="error", + ) + output.add_blank_line("error") + for name, defns in sorted(duplicate_groups.items()): + locations = [] + for defn in sorted(defns, key=lambda d: (d.file, d.code_block_start_line)): + anchor = _anchor_for_definition(defn, tech_specs_dir, content_cache) + locations.append(f"{defn.file}{anchor}:{defn.code_block_start_line}") + locations_str = f"({','.join(locations)})" + first_defn = defns[0] + output.add_error_line( + ValidationIssue.create( + "Duplicate definition", + Path(first_defn.file), + first_defn.code_block_start_line, + first_defn.code_block_start_line, + message=f"Definition '{name}' found in multiple files {locations_str}", + severity="error", + ).format_message(no_color=output.no_color) + ) + + +def _anchor_for_definition(defn, tech_specs_dir: Path, content_cache) -> str: + """Return anchor string for a definition (heading-based or #line-N).""" + file_path = tech_specs_dir / defn.file + if not file_path.exists(): + return f"#line-{defn.code_block_start_line}" + try: + file_content = content_cache.get_content(file_path) + heading = find_heading_for_code_block( + file_content, defn.code_block_start_line + ) + if heading: + return generate_anchor_from_heading(heading, include_hash=True) + except (ValueError, AttributeError, KeyError, OSError, RuntimeError, TypeError): + pass + return f"#line-{defn.code_block_start_line}" + + +def _definition_from_signature( + sig, + block_lines: List[str], + line_index: int, + *, + md_file: Path, + start_line: int, + code_content: str, +) -> DetectedDefinition: + """Build one DetectedDefinition from a parsed signature (kind dispatch).""" + def_comments = extract_go_doc_comment_above(block_lines, line_index) + if sig.kind == "method" and sig.receiver: + receiver_type = normalize_generic_name(sig.receiver) + normalized_name = f"{receiver_type}.{sig.name}" + return DetectedDefinition( + name=normalized_name, + kind="method", + file=md_file.name, + code_block_start_line=start_line, + code_block_content=code_content, + raw_name=sig.name, + receiver_type=receiver_type, + def_comments=def_comments or None, + ) + if sig.kind == "func": + normalized_name = normalize_generic_name(sig.name) + ( + input_types, + output_types, + referenced_types, + referenced_methods, + ) = _extract_signature_types(sig) + return DetectedDefinition( + name=normalized_name, + kind="func", + file=md_file.name, + code_block_start_line=start_line, + code_block_content=code_content, + raw_name=sig.name, + input_types=input_types, + output_types=output_types, + referenced_types=referenced_types, + referenced_methods=referenced_methods, + def_comments=def_comments or None, + ) + if sig.kind in ("interface", "struct"): + normalized_name = normalize_generic_name(sig.name) + return DetectedDefinition( + name=normalized_name, + kind=sig.kind, + file=md_file.name, + code_block_start_line=start_line, + code_block_content=code_content, + raw_name=sig.name, + def_comments=def_comments or None, + ) + normalized_name = normalize_generic_name(sig.name) + return DetectedDefinition( + name=normalized_name, + kind="type", + file=md_file.name, + code_block_start_line=start_line, + code_block_content=code_content, + raw_name=sig.name, + def_comments=def_comments or None, + ) + + +def _definitions_from_block( + block_lines: List[str], + code_content: str, + start_line: int, + *, + md_file: Path, + content: str, + lines: List[str], + headings: List[tuple], + tech_specs_dir: Path, + output: Optional[OutputBuilder], + content_cache: FileContentCache, +) -> List[DetectedDefinition]: + """Return all DetectedDefinitions from one code block (line defs only).""" + heading_above = find_heading_before_line(content, start_line) + heading_text = ( + heading_above.heading_text if heading_above and heading_above.heading_text else None + ) + result: List[DetectedDefinition] = [] + for i, line in enumerate(block_lines): + if is_example_code( + code_content, start_line, + lines=lines, heading_text=heading_text, check_single_line=i, + ): + continue + sig = parse_go_def_signature( + line, location=f"{md_file.name}:{start_line + i}", + ) + if sig is None: + continue + defn = _definition_from_signature( + sig, block_lines, i, + md_file=md_file, start_line=start_line, code_content=code_content, + ) + _populate_heading_context( + defn, content, lines, headings, + tech_specs_dir=tech_specs_dir, + output=output, + content_cache=content_cache, + ) + result.append(defn) + return result + + def discover_all_definitions( tech_specs_dir: Path, output: Optional[OutputBuilder] = None, @@ -132,15 +303,11 @@ def discover_all_definitions( definitions: List[DetectedDefinition] = [] definitions_by_name: Dict[str, List[DetectedDefinition]] = {} content_cache = file_cache or FileContentCache() - - # Get all markdown files, excluding index files files_to_check = [ f for f in tech_specs_dir.glob("*.md") - if not is_in_dot_directory(f) - and f.name != index_filename + if not is_in_dot_directory(f) and f.name != index_filename ] - if output and output.verbose: msg = f"Scanning {len(files_to_check)} tech spec files for definitions..." output.add_verbose_line(msg) @@ -151,190 +318,36 @@ def discover_all_definitions( lines = content_cache.get_lines(md_file) headings = extract_headings(content, skip_code_blocks=True) go_blocks = find_go_code_blocks(content) - for start_line, _end_line, code_content in go_blocks: block_lines = code_content.split("\n") - - # Find heading above this code block for example detection - heading_above = find_heading_before_line(content, start_line) - heading_text = ( - heading_above.heading_text - if heading_above and heading_above.heading_text - else None + block_defs = _definitions_from_block( + block_lines, code_content, start_line, + md_file=md_file, content=content, lines=lines, headings=headings, + tech_specs_dir=tech_specs_dir, output=output, content_cache=content_cache, ) - - for i, line in enumerate(block_lines): - # Check if this is example code (includes heading check) - is_example = is_example_code( - code_content, - start_line, - lines=lines, - heading_text=heading_text, - check_single_line=i, - ) - - sig = parse_go_def_signature( - line, - location=f"{md_file.name}:{start_line + i}", - ) - if sig is None: - continue - if is_example: - continue - - if sig.kind == "method" and sig.receiver: - receiver_type = normalize_generic_name(sig.receiver) - normalized_name = f"{receiver_type}.{sig.name}" - def_comments = extract_go_doc_comment_above(block_lines, i) - defn = DetectedDefinition( - name=normalized_name, - kind="method", - file=md_file.name, - code_block_start_line=start_line, - code_block_content=code_content, - raw_name=sig.name, - receiver_type=receiver_type, - def_comments=def_comments or None, - ) - elif sig.kind == "func": - normalized_name = normalize_generic_name(sig.name) - def_comments = extract_go_doc_comment_above(block_lines, i) - ( - input_types, - output_types, - referenced_types, - referenced_methods, - ) = _extract_signature_types(sig) - defn = DetectedDefinition( - name=normalized_name, - kind="func", - file=md_file.name, - code_block_start_line=start_line, - code_block_content=code_content, - raw_name=sig.name, - input_types=input_types, - output_types=output_types, - referenced_types=referenced_types, - referenced_methods=referenced_methods, - def_comments=def_comments or None, - ) - elif sig.kind in ("interface", "struct"): - normalized_name = normalize_generic_name(sig.name) - def_comments = extract_go_doc_comment_above(block_lines, i) - defn = DetectedDefinition( - name=normalized_name, - kind=sig.kind, - file=md_file.name, - code_block_start_line=start_line, - code_block_content=code_content, - raw_name=sig.name, - def_comments=def_comments or None, - ) - else: - # Preserve existing behavior: all non-struct/interface type-ish - # definitions are treated as "type" (includes aliases). - normalized_name = normalize_generic_name(sig.name) - def_comments = extract_go_doc_comment_above(block_lines, i) - defn = DetectedDefinition( - name=normalized_name, - kind="type", - file=md_file.name, - code_block_start_line=start_line, - code_block_content=code_content, - raw_name=sig.name, - def_comments=def_comments or None, - ) - - _populate_heading_context( - defn, - content, - lines, - headings, - tech_specs_dir, - output, - content_cache, - ) + for defn in block_defs: definitions.append(defn) - definitions_by_name.setdefault(normalized_name, []).append(defn) - continue + definitions_by_name.setdefault(defn.name, []).append(defn) - except Exception as e: + except (OSError, UnicodeDecodeError, ValueError, RuntimeError) as e: if output: output.add_error_line( - ValidationIssue( + ValidationIssue.create( "Error reading file", Path(md_file.name), 0, 0, - f"Could not read file: {e}", + message=f"Could not read file: {e}", severity="error", ).format_message(no_color=output.no_color) ) continue - # Check for duplicates - duplicate_groups: Dict[str, List[DetectedDefinition]] = {} - for name, defns in definitions_by_name.items(): - if len(defns) > 1: - # Check if they're in different files (might be intentional) - files = {d.file for d in defns} - if len(files) > 1: - # Same name in multiple files - flag as ERROR - duplicate_groups[name] = defns - + duplicate_groups = _build_duplicate_groups(definitions_by_name) if duplicate_groups and output: - output.add_errors_header() - total_duplicates = sum(len(defns) for defns in duplicate_groups.values()) - output.add_line( - f"Found {len(duplicate_groups)} duplicate definition(s) across multiple files:", - section="error", + _emit_duplicate_errors( + duplicate_groups, output, tech_specs_dir, content_cache ) - output.add_blank_line("error") - for name, defns in sorted(duplicate_groups.items()): - # Build location list with anchors - read headings on-the-fly - locations = [] - for defn in sorted(defns, key=lambda d: (d.file, d.code_block_start_line)): - # Read file to get heading for anchor - file_path = tech_specs_dir / defn.file - anchor = "" - if file_path.exists(): - try: - file_content = content_cache.get_content(file_path) - heading = find_heading_for_code_block( - file_content, defn.code_block_start_line - ) - if heading: - anchor = generate_anchor_from_heading( - heading, include_hash=True - ) - else: - # Fallback: generate anchor from line number - anchor = f"#line-{defn.code_block_start_line}" - except (ValueError, AttributeError, KeyError): - # Data structure errors - fallback to line number anchor - anchor = f"#line-{defn.code_block_start_line}" - except Exception: - # Unexpected errors - fallback to line number anchor - anchor = f"#line-{defn.code_block_start_line}" - else: - anchor = f"#line-{defn.code_block_start_line}" - - locations.append(f"{defn.file}{anchor}:{defn.code_block_start_line}") - locations_str = f"({','.join(locations)})" - - # Use first definition for the error message location - first_defn = defns[0] - output.add_error_line( - ValidationIssue( - "Duplicate definition", - Path(first_defn.file), - first_defn.code_block_start_line, - first_defn.code_block_start_line, - f"Definition '{name}' found in multiple files {locations_str}", - severity="error", - ).format_message(no_color=output.no_color) - ) - if output and output.verbose: output.add_verbose_line(f"Found {len(definitions)} total definitions") if duplicate_groups: @@ -353,6 +366,7 @@ def _populate_heading_context( content: str, lines: List[str], headings: List[tuple[str, int, int]], + *, tech_specs_dir: Path, output: Optional[OutputBuilder], content_cache: FileContentCache, diff --git a/scripts/lib/go_defs_index/_go_defs_index_headings.py b/scripts/lib/go_defs_index/_go_defs_index_headings.py index 4cf5e7a8..776d180f 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_headings.py +++ b/scripts/lib/go_defs_index/_go_defs_index_headings.py @@ -16,6 +16,67 @@ ) +def _current_heading_result(definition): + """Return (file, heading, anchor) for the definition's current heading.""" + anchor = generate_anchor_from_heading(definition.heading, include_hash=True) + return (definition.file, definition.heading, anchor) + + +def _parse_canonical_link_target(link_target: str, definition) -> Tuple[str, str]: + """Parse link target into (canonical_file, canonical_anchor).""" + if "#" in link_target: + file_part, anchor_part = link_target.split("#", 1) + canonical_file = file_part if file_part else definition.file + canonical_anchor = "#" + anchor_part + else: + canonical_file = link_target if link_target else definition.file + canonical_anchor = "" + return (canonical_file, canonical_anchor) + + +def _emit_canonical_error( + output: Optional[OutputBuilder], + definition, + message: str, + issue_message: str, +) -> None: + """Emit a canonical validation error line if output is set.""" + if output: + output.add_error_line( + ValidationIssue.create( + message, + Path(definition.file), + definition.heading_line, + definition.heading_line, + message=issue_message, + severity="error", + ).format_message(no_color=output.no_color) + ) + + +def _find_canonical_heading_in_file( + content_cache: FileContentCache, + canonical_file_path: Path, + canonical_anchor: str, +) -> Optional[str]: + """Return canonical heading text if anchor found in file, else None.""" + try: + canonical_content = content_cache.get_content(canonical_file_path) + canonical_headings = extract_headings( + canonical_content, skip_code_blocks=True + ) + anchor_without_hash = canonical_anchor.lstrip("#") + for heading_text, _level, _line_num in canonical_headings: + heading_anchor = generate_anchor_from_heading( + heading_text, include_hash=True + ) + if heading_anchor.lstrip("#") == anchor_without_hash: + return heading_text + except (OSError, UnicodeDecodeError, ValueError, RuntimeError): + pass + return None + + def resolve_canonical_reference( definition, content: str, @@ -32,161 +93,69 @@ def resolve_canonical_reference( (canonical_file, canonical_heading, canonical_anchor) """ repo_root = get_workspace_root() - if not definition.section_content: - # No section content, use current heading - anchor = generate_anchor_from_heading(definition.heading, include_hash=True) - return (definition.file, definition.heading, anchor) - + return _current_heading_result(definition) section_content_lower = definition.section_content.lower() - - # Exception: if section contains "this is the canonical", treat as canonical if "this is the canonical" in section_content_lower: - anchor = generate_anchor_from_heading(definition.heading, include_hash=True) - return (definition.file, definition.heading, anchor) - - # Search for "canonical" keyword (case-insensitive) - canonical_match = re.search(r"\bcanonical\b", definition.section_content, re.IGNORECASE) + return _current_heading_result(definition) + canonical_match = re.search( + r"\bcanonical\b", definition.section_content, re.IGNORECASE + ) if not canonical_match: - # No canonical reference, use current heading - anchor = generate_anchor_from_heading(definition.heading, include_hash=True) - return (definition.file, definition.heading, anchor) + return _current_heading_result(definition) - # Find link on same line or following lines - # Calculate line offset of canonical match lines_before_match = definition.section_content[: canonical_match.start()].count("\n") - canonical_line_start = definition.heading_line - 1 + lines_before_match # 0-indexed - - # Search from canonical line to end of section for markdown link - search_start = canonical_line_start + canonical_line_start = definition.heading_line - 1 + lines_before_match content_lines = len(content.split("\n")) section_lines = definition.section_content.count("\n") search_end = min(content_lines, definition.heading_line - 1 + section_lines) - search_lines = content.split("\n")[search_start:search_end + 1] + search_lines = content.split("\n")[canonical_line_start:search_end + 1] search_content = "\n".join(search_lines) - link_pattern = r"\[([^\]]+)\]\(([^)]+)\)" link_match = re.search(link_pattern, search_content) - - if link_match: - link_target = link_match.group(2) # e.g., "api_core.md#20-package-interface" - - # Parse link target - if "#" in link_target: - file_part, anchor_part = link_target.split("#", 1) - canonical_file = file_part if file_part else definition.file - canonical_anchor = "#" + anchor_part - else: - canonical_file = link_target if link_target else definition.file - canonical_anchor = "" - - # Validate file name - if not validate_file_name(canonical_file): - if output: - output.add_error_line( - ValidationIssue( - "Invalid canonical link", - Path(definition.file), - definition.heading_line, - definition.heading_line, - f"Canonical link points to invalid file: {canonical_file}", - severity="error", - ).format_message(no_color=output.no_color) - ) - anchor = generate_anchor_from_heading(definition.heading, include_hash=True) - return (definition.file, definition.heading, anchor) - - # Resolve file path - canonical_file_path = tech_specs_dir / canonical_file - if not canonical_file_path.exists(): - if output: - output.add_error_line( - ValidationIssue( - "Canonical file not found", - Path(definition.file), - definition.heading_line, - definition.heading_line, - f"Canonical link points to non-existent file: {canonical_file}", - severity="error", - ).format_message(no_color=output.no_color) - ) - anchor = generate_anchor_from_heading(definition.heading, include_hash=True) - return (definition.file, definition.heading, anchor) - - # Validate path is within repository - if not is_safe_path(canonical_file_path, repo_root): - if output: - output.add_error_line( - ValidationIssue( - "Unsafe canonical path", - Path(definition.file), - definition.heading_line, - definition.heading_line, - f"Canonical link points outside repository: {canonical_file}", - severity="error", - ).format_message(no_color=output.no_color) - ) - anchor = generate_anchor_from_heading(definition.heading, include_hash=True) - return (definition.file, definition.heading, anchor) - - content_cache = file_cache or FileContentCache() - - # Read target file and find heading for anchor - try: - canonical_content = content_cache.get_content(canonical_file_path) - canonical_headings = extract_headings( - canonical_content, skip_code_blocks=True - ) - - # Find heading that matches the anchor - anchor_without_hash = canonical_anchor.lstrip("#") - canonical_heading_text = None - - for heading_text, level, line_num in canonical_headings: - heading_anchor = generate_anchor_from_heading( - heading_text, include_hash=True - ) - if heading_anchor.lstrip("#") == anchor_without_hash: - canonical_heading_text = heading_text - break - - if canonical_heading_text: - return (canonical_file, canonical_heading_text, canonical_anchor) - else: - # Anchor not found, use current heading - if output: - output.add_error_line( - ValidationIssue( - "Canonical anchor not found", - Path(definition.file), - definition.heading_line, - definition.heading_line, - f"Canonical anchor '{canonical_anchor}' not found in {canonical_file}", - severity="error", - ).format_message(no_color=output.no_color) - ) - anchor = generate_anchor_from_heading( - definition.heading, include_hash=True - ) - return (definition.file, definition.heading, anchor) - - except Exception as e: - if output: - output.add_error_line( - ValidationIssue( - "Error reading canonical file", - Path(definition.file), - definition.heading_line, - definition.heading_line, - f"Could not read canonical file {canonical_file}: {e}", - severity="error", - ).format_message(no_color=output.no_color) - ) - anchor = generate_anchor_from_heading( - definition.heading, include_hash=True - ) - return (definition.file, definition.heading, anchor) - - # No link found after "canonical", use current heading - anchor = generate_anchor_from_heading(definition.heading, include_hash=True) - return (definition.file, definition.heading, anchor) + if not link_match: + return _current_heading_result(definition) + + link_target = link_match.group(2) + canonical_file, canonical_anchor = _parse_canonical_link_target( + link_target, definition + ) + if not validate_file_name(canonical_file): + _emit_canonical_error( + output, + definition, + "Invalid canonical link", + f"Canonical link points to invalid file: {canonical_file}", + ) + return _current_heading_result(definition) + canonical_file_path = tech_specs_dir / canonical_file + if not canonical_file_path.exists(): + _emit_canonical_error( + output, + definition, + "Canonical file not found", + f"Canonical link points to non-existent file: {canonical_file}", + ) + return _current_heading_result(definition) + if not is_safe_path(canonical_file_path, repo_root): + _emit_canonical_error( + output, + definition, + "Unsafe canonical path", + f"Canonical link points outside repository: {canonical_file}", + ) + return _current_heading_result(definition) + + content_cache = file_cache or FileContentCache() + canonical_heading_text = _find_canonical_heading_in_file( + content_cache, canonical_file_path, canonical_anchor + ) + if canonical_heading_text: + return (canonical_file, canonical_heading_text, canonical_anchor) + _emit_canonical_error( + output, + definition, + "Canonical anchor not found", + f"Canonical anchor '{canonical_anchor}' not found in {canonical_file}", + ) + return _current_heading_result(definition) diff --git a/scripts/lib/go_defs_index/_go_defs_index_matching.py b/scripts/lib/go_defs_index/_go_defs_index_matching.py index 26b70120..e9efbffb 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_matching.py +++ b/scripts/lib/go_defs_index/_go_defs_index_matching.py @@ -15,6 +15,9 @@ from lib.go_defs_index._go_defs_index_models import DetectedDefinition, IndexEntry from lib._index_utils import IndexSection from lib.go_defs_index._go_defs_index_scoring import calculate_confidence_score +from lib.go_defs_index._go_defs_index_matching_helpers import ( + adjust_related_section_for_function, +) from lib.go_defs_index._go_defs_index_shared import map_implementation_to_interface from lib._validation_utils import OutputBuilder @@ -37,8 +40,9 @@ def _expected_kind_for_definition(defn: DetectedDefinition) -> str: def _definition_sort_key(defn: DetectedDefinition) -> Tuple[int, str]: - kind_order = {"type": 0, "method": 1, "func": 2} - return (kind_order.get(defn.kind, 99), defn.name.lower()) + if defn.kind in ("method", "func"): + return (1 if defn.kind == "method" else 2, defn.name.lower()) + return (0, defn.name.lower()) def _populate_section_valid_types(index_sections: List[IndexSection]) -> None: @@ -60,7 +64,7 @@ def _populate_section_valid_types(index_sections: List[IndexSection]) -> None: sec.valid_types = set() for sec in index_sections: - if sec.kind != "method": + if sec.kind not in ("method", "func"): continue if sec.parent_heading and sec.parent_heading.kind == "type": sec.valid_types = set(sec.parent_heading.valid_types) @@ -68,57 +72,161 @@ def _populate_section_valid_types(index_sections: List[IndexSection]) -> None: _METHOD_CATEGORY_RULES = { "Package": [ - ("Package Signature Methods", ["signature", "sign", "validatesignature"]), - ("Package Lifecycle Methods", ["open", "close", "isopen", "isreadonly"]), - ("Package File Management Methods", ["addfile", "removefile", "extract"]), + ("Package Comment Methods", ["comment"]), + ("Package Identity Methods", ["appid", "vendorid", "identity", "packageidentity"]), ( - "Package Information and Queries Methods", - ["getinfo", "getmetadata", "listfiles", "fileexists", "getfile"], + "Package Special File Methods", + [ + "indexfile", + "manifestfile", + "metadatafile", + "signaturefile", + "specialfile", + "specialmetadata", + ], ), - ("Package Metadata Methods", ["comment", "appid", "vendorid", "identity"]), - ("Package Compression Methods", ["compress", "decompress"]), ( - "Package Path and Configuration Methods", - ["targetpath", "extractroot", "sessionbase"], + "Package Path Metadata Methods", + [ + "pathmetadata", + "directorymetadata", + "filepathassociation", + "pathconflicts", + "pathstats", + "pathtree", + "pathfiles", + "filesinpath", + "destpath", + "targetexists", + ], ), - ], - "FileEntry": [ + ("Package Symlink Methods", ["symlink"]), + ("Package Metadata-Only Methods", ["metadataonly"]), + ("Package Info Methods", ["packageinfo"]), ( - "FileEntry Data Management Methods", - ["loaddata", "unloaddata", "setdata", "getdata"], + "Package Metadata Validation Methods", + [ + "validatemetadataonly", + "validatepathmetadata", + "validatespecial", + "validatesymlink", + "validatepathwithin", + ], ), + ("Package Metadata Internal Methods", ["load", "save", "updatefilepathassociations"]), ( - "FileEntry Transformation Methods", - ["compress", "decompress", "encrypt", "decrypt", "transform"], + "Package File Encryption Methods", + ["encryptfile", "decryptfile", "validatefileencryption", "fileencryptioninfo"], + ), + ("Package Write Methods", ["safewrite", "fastwrite", "write"]), + ("Package Signature Management Methods", ["signature", "sign", "validatesignature"]), + ( + "Package Lifecycle Methods", + ["open", "close", "validate", "integrity", "defragment"], ), - ("FileEntry Tag Management Methods", ["tag", "tags"]), ( - "FileEntry Path and Metadata Methods", - ["path", "metadata", "associate"], + "Package File Management Methods", + ["addfile", "removefile", "extract", "updatefile", "addfilepath", "removefilepath"], ), - ("FileEntry Source Management Methods", ["source", "current", "original"]), - ("FileEntry Marshaling Methods", ["marshal", "unmarshal"]), + ("Package Compression Methods", ["compress", "compressed", "compression", "decompress"]), ( - "FileEntry Query Methods", - ["is", "has", "get", "compressed", "encrypted"], + "Package Information and Queries Methods", + [ + "getinfo", + "getmetadata", + "listfiles", + "fileexists", + "getfile", + "getpath", + "getpathstats", + "getpathmetadata", + "isopen", + "isreadonly", + "securitystatus", + "multipath", + "list", + "find", + "has", + ], + ), + ( + "Package Path and Configuration Methods", + ["targetpath", "extractroot", "sessionbase"], ), ], - "PackageReader": [ - ("PackageReader Read Operations", ["read", "readfile"]), - ("PackageReader Query Operations", ["list", "getinfo", "getmetadata", "query"]), + "FileEntry": [ + ("FileEntry Data Methods", ["getdata", "setdata", "loaddata", "unloaddata", "data"]), + ("FileEntry Temp File Methods", ["tempfile", "temp"]), + ("FileEntry Serialization Methods", ["marshal", "writedata", "writemeta", "writeto"]), + ("FileEntry Path Methods", ["path", "symlink", "associate", "resolve"]), + ( + "FileEntry Transformation Methods", + [ + "compress", + "decompress", + "encrypt", + "decrypt", + "transform", + "process", + "pipeline", + "set", + "unset", + "current", + "original", + "processingstate", + "validate", + "cleanup", + "resume", + "execute", + "copy", + ], + ), + ("FileEntry Query Methods", ["get", "has", "is"]), ], - "PackageWriter": [ - ("PackageWriter Write Operations", ["write", "writedata", "writefile"]), + "Tag": [ + ("Tag Methods", ["get", "set"]), ], } _METHOD_CATEGORY_DEFAULTS = { "Package": "Package Other Methods", - "FileEntry": "FileEntry Other Methods", - "PackageReader": "PackageReader Other Methods", - "PackageWriter": "PackageWriter Other Methods", + "FileEntry": "FileEntry Query Methods", + "Tag": "Tag Methods", } +_PACKAGE_METHOD_OVERRIDE_EXACT = { + "addpathtoexistingentry": "Package File Management Methods", + "isopen": "Package Information and Queries Methods", + "loadpathmetadata": "Package Metadata Internal Methods", + "loadspecialmetadatafiles": "Package Metadata Internal Methods", + "loadsymlinkmetadatafile": "Package Symlink Methods", + "savepathmetadatafile": "Package Metadata Internal Methods", + "savesymlinkmetadatafile": "Package Metadata Internal Methods", + "updatefilemetadata": "Package Path Metadata Methods", + "validatemetadataonlyintegrity": "Package Metadata Validation Methods", + "validatemetadataonlypackage": "Package Metadata Validation Methods", + "validatepathmetadata": "Package Metadata Validation Methods", + "validatespecialfiles": "Package Metadata Validation Methods", + "validatesymlinkpaths": "Package Metadata Validation Methods", +} + +_PACKAGE_METHOD_OVERRIDE_CONTAINS = [ + ("validatefileencryption", "Package File Encryption Methods"), + ("encryptioninfo", "Package File Encryption Methods"), + ("compressioninfo", "Package Compression Methods"), + ("listcompressedfiles", "Package Compression Methods"), + ("bytag", "Package Information and Queries Methods"), + ("metadataindex", "Package Compression Methods"), + ("multipath", "Package Information and Queries Methods"), + ("filepathassociations", "Package Metadata Internal Methods"), + ("sessionbase", "Package Path and Configuration Methods"), + ("targetpath", "Package Path and Configuration Methods"), +] + +_PACKAGE_METHOD_OVERRIDE_PREFIXES = [ + ("updatefile", "Package File Management Methods"), +] + def _categorize_by_keywords( method_lower: str, @@ -131,7 +239,50 @@ def _categorize_by_keywords( return fallback -def categorize_method(method_name: str, receiver_type: str) -> str: +def _package_category_overrides( + method_lower: str, +) -> Optional[str]: + exact_override = _PACKAGE_METHOD_OVERRIDE_EXACT.get(method_lower) + if exact_override: + return exact_override + for token, category in _PACKAGE_METHOD_OVERRIDE_CONTAINS: + if token in method_lower: + return category + for prefix, category in _PACKAGE_METHOD_OVERRIDE_PREFIXES: + if method_lower.startswith(prefix) and "pattern" not in method_lower: + return category + return None + + +def _package_category_from_file( + defn: DetectedDefinition, + method_lower: str, +) -> Optional[str]: + if not defn.file: + return None + file_name = defn.file + if file_name in ( + "api_file_mgmt_addition.md", + "api_file_mgmt_removal.md", + "api_file_mgmt_extraction.md", + ): + return "Package File Management Methods" + if file_name == "api_file_mgmt_queries.md": + if "compressed" in method_lower: + return "Package Compression Methods" + return "Package Information and Queries Methods" + if file_name == "api_deduplication.md": + return "Package Information and Queries Methods" + if file_name == "api_signatures.md": + return "Package Signature Management Methods" + if file_name == "api_package_compression.md": + return "Package Compression Methods" + if file_name == "api_security.md" and "signature" in method_lower: + return "Package Signature Management Methods" + return None + + +def categorize_method(defn: DetectedDefinition, receiver_type: str) -> str: """ Categorize a method into a logical group for sub-subsection placement. @@ -139,11 +290,59 @@ def categorize_method(method_name: str, receiver_type: str) -> str: "FileEntry Data Management Methods", etc. The category name includes the type prefix. """ + method_name = defn.name.split(".", 1)[1] if "." in defn.name else defn.name method_lower = method_name.lower() if receiver_type == "PathMetadataEntry": - # PathMetadataEntry methods are grouped under "Metadata Methods" in the index. - return "Metadata Methods" + # PathMetadataEntry methods are grouped under Package Path Metadata Methods. + return "Package Path Metadata Methods" + + if receiver_type == "Tag": + return "Tag Methods" + + if receiver_type == "Package": + override_category = _package_category_overrides(method_lower) + if override_category: + return override_category + file_category = _package_category_from_file(defn, method_lower) + if file_category: + return file_category + + if receiver_type == "FileEntry": + if method_lower in {"getdata", "setdata", "loaddata", "unloaddata"}: + return "FileEntry Data Methods" + if "tempfile" in method_lower: + return "FileEntry Temp File Methods" + if method_lower.startswith(("marshal", "writedata", "writemeta", "writeto")): + return "FileEntry Serialization Methods" + if any( + token in method_lower + for token in ("pathmetadata", "symlink", "path", "associate", "resolve") + ): + return "FileEntry Path Methods" + if method_lower.startswith(("get", "has", "is")): + return "FileEntry Query Methods" + if method_lower.startswith( + ( + "set", + "compress", + "decompress", + "encrypt", + "decrypt", + "process", + "transform", + "resume", + "execute", + "cleanup", + "validate", + "copy", + "unset", + ) + ) or any( + token in method_lower + for token in ("pipeline", "current", "original", "processingstate") + ): + return "FileEntry Transformation Methods" categories = _METHOD_CATEGORY_RULES.get(receiver_type) if not categories: @@ -170,7 +369,7 @@ def _get_section_by_receiver( receiver_type: str, type_sections: List[IndexSection], ) -> Optional[IndexSection]: - receiver = normalize_generic_name(receiver_type).lower() + receiver = normalize_generic_name(receiver_type) for section in type_sections: if receiver in section.expected_entries: return section @@ -199,11 +398,14 @@ def _select_method_section_by_category( receiver_type: str, candidates: List[IndexSection], ) -> Optional[IndexSection]: - method_name = defn.name.split(".", 1)[1] if "." in defn.name else defn.name - category = categorize_method(method_name, receiver_type) + category = categorize_method(defn, receiver_type) for section in candidates: if section.heading_text == category: return section + if receiver_type not in _METHOD_CATEGORY_RULES: + for section in candidates: + if section.heading_text.endswith("Other Type Methods"): + return section if receiver_type == "Package" and _is_signature_package_method(defn): for section in candidates: if section.heading_text == "Package Other Methods": @@ -285,7 +487,7 @@ def _place_type_definitions( definitions: List[DetectedDefinition], context: PlacementContext, ) -> None: - for defn in [d for d in definitions if d.kind == "type"]: + for defn in [d for d in definitions if d.kind not in ("method", "func")]: best_section, best_score, best_reasoning = _best_section_for_definition( defn, context.type_sections, @@ -316,37 +518,50 @@ def _place_method_definition( _add_unresolved_entry(defn, context.unsorted_methods) return receiver = _normalize_receiver_type(defn.receiver_type) - if receiver in context.unresolved_types: - current_section = context.parsed_index.find_section_by_current_entry(defn.name) - if current_section and current_section.kind == "method": - entry = defn.to_index_entry(current_section.path_label()) - entry.suggested_section = current_section.path_label() - current_section.expected_entries[entry.name] = entry - return - _add_unresolved_entry(defn, context.unsorted_methods) - return receiver_section = _get_section_by_receiver(receiver, context.type_sections) if not receiver_section: + if receiver in context.unresolved_types: + current_section = context.parsed_index.find_section_by_current_entry(defn.name) + if current_section and current_section.kind == "method": + score, reasoning = calculate_confidence_score( + defn, + current_section.path_label(), + context.all_sections, + context.section_valid_types, + ) + defn.confidence_score = score + defn.confidence_reasoning = reasoning + entry = defn.to_index_entry(current_section.path_label()) + entry.suggested_section = current_section.path_label() + current_section.expected_entries[entry.name] = entry + return _add_unresolved_entry(defn, context.unsorted_methods) return - candidates = [ - child for child in receiver_section.children if child.kind == "method" - ] + candidates: List[IndexSection] = [] + for child in receiver_section.children: + if child.kind == "method": + candidates.append(child) + for grandchild in child.children: + if grandchild.kind == "method": + candidates.append(grandchild) if not candidates: _add_unresolved_entry(defn, context.unsorted_methods) return - category_section = _select_method_section_by_category(defn, receiver, candidates) - if category_section is not None: - defn.confidence_score = 1.0 - defn.confidence_reasoning = ["Category match: structure-first placement"] - _assign_definition_to_section(defn, category_section) - return best_section, best_score, best_reasoning = _best_section_for_definition( defn, candidates, context.all_sections, context.section_valid_types, ) + category_section = _select_method_section_by_category(defn, receiver, candidates) + if category_section is not None: + defn.confidence_score = best_score + defn.confidence_reasoning = list(best_reasoning) + defn.confidence_reasoning.append( + "Category match: structure-first placement (placement override)" + ) + _assign_definition_to_section(defn, category_section) + return defn.confidence_score = best_score defn.confidence_reasoning = best_reasoning if best_section and best_score >= CONFIDENCE_THRESHOLD: @@ -429,6 +644,7 @@ def _place_function_definitions( ) -> None: for defn in [d for d in definitions if d.kind == "func"]: related_section = _find_related_section(defn, context.type_sections) + related_section = adjust_related_section_for_function(defn, related_section) candidates = context.func_sections if related_section: related_candidates = [ @@ -442,6 +658,16 @@ def _place_function_definitions( context.all_sections, context.section_valid_types, ) + if not best_section: + current_section = context.parsed_index.find_section_by_current_entry(defn.name) + if current_section and current_section.kind == "func": + best_score, best_reasoning = calculate_confidence_score( + defn, + current_section.path_label(), + context.all_sections, + context.section_valid_types, + ) + best_section = current_section defn.confidence_score = best_score defn.confidence_reasoning = best_reasoning if best_section and best_score >= CONFIDENCE_THRESHOLD: @@ -464,9 +690,22 @@ def _emit_verbose_placements( for defn in definitions: section = parsed_index.find_section_by_expected_entry(defn.name) entry = section.expected_entries.get(defn.name) if section else None + if not entry: + for unsorted_path in parsed_index.unsorted_paths: + unsorted_section = parsed_index.sections.get(unsorted_path) + if not unsorted_section: + continue + if defn.name in unsorted_section.expected_entries: + section = unsorted_section + entry = unsorted_section.expected_entries.get(defn.name) + break if entry: reasoning_str = ", ".join(entry.confidence_reasoning) - target_section = entry.suggested_section or entry.current_section + target_section = ( + entry.suggested_section + or entry.current_section + or (section.path_label() if section else None) + ) if entry.confidence_score is None: score_str = "N/A" else: @@ -474,6 +713,17 @@ def _emit_verbose_placements( output.add_verbose_line( f"{defn.name} -> {target_section}: {score_str} ({reasoning_str})" ) + if ( + entry.confidence_score is not None + and entry.confidence_score < CONFIDENCE_THRESHOLD + ): + output.add_warning_line( + ( + f"Low-confidence placement: {defn.name} -> {target_section} " + f"({score_str})" + ), + verbose_only=True, + ) else: output.add_verbose_line( f"{defn.name} -> (no section): 0% (no valid matches)" @@ -496,6 +746,10 @@ def determine_section_placement( # Ensure processing order: types, then methods, then funcs. definitions.sort(key=_definition_sort_key) + for defn in definitions: + current_section = parsed_index.find_section_by_current_entry(defn.name) + defn.current_section = current_section.path_label() if current_section else None + all_sections = set(parsed_index.section_order) _populate_section_valid_types(list(parsed_index.sections.values())) section_valid_types = { diff --git a/scripts/lib/go_defs_index/_go_defs_index_matching_helpers.py b/scripts/lib/go_defs_index/_go_defs_index_matching_helpers.py new file mode 100644 index 00000000..97a42157 --- /dev/null +++ b/scripts/lib/go_defs_index/_go_defs_index_matching_helpers.py @@ -0,0 +1,28 @@ +""" +Helpers for Go defs index matching adjustments. +""" + +from __future__ import annotations + +from typing import Optional + +from lib.go_defs_index._go_defs_index_models import DetectedDefinition +from lib._index_utils import IndexSection + + +def adjust_related_section_for_function( + defn: DetectedDefinition, + related_section: Optional[IndexSection], +) -> Optional[IndexSection]: + if not related_section: + return related_section + name_lower = defn.name.lower() + if "fileentrytag" in name_lower: + return None + if defn.name == "NewPackageError": + return None + if "packagewithoptions" in name_lower: + return None + if name_lower in {"readheaderfrompath", "setdestpath"}: + return None + return related_section diff --git a/scripts/lib/go_defs_index/_go_defs_index_ordering.py b/scripts/lib/go_defs_index/_go_defs_index_ordering.py index 8ba2d262..0c7f0fbc 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_ordering.py +++ b/scripts/lib/go_defs_index/_go_defs_index_ordering.py @@ -1,8 +1,9 @@ from __future__ import annotations +from pathlib import Path from typing import Optional from lib._index_utils import ParsedIndex -from lib._validation_utils import OutputBuilder +from lib._validation_utils import OutputBuilder, ValidationIssue def determine_ordering( @@ -12,5 +13,56 @@ def determine_ordering( """ Phase 6: Determine correct ordering within sections. """ - _ = output + if output: + _emit_ordering_warnings(parsed_index, output) parsed_index.sort_expected_entries() + + +def _emit_ordering_warnings( + parsed_index: ParsedIndex, + output: OutputBuilder, +) -> None: + max_warnings_per_section = 5 + index_path = Path("api_go_defs_index.md") + for section_path in parsed_index.section_order: + section = parsed_index.sections.get(section_path) + if not section or len(section.entries) < 2: + continue + expected = sorted(section.entries, key=lambda entry: entry.sort_key()) + if [entry.name for entry in section.entries] == [entry.name for entry in expected]: + continue + mismatches = 0 + for idx, entry in enumerate(section.entries): + if idx >= len(expected): + break + expected_entry = expected[idx] + if entry.name == expected_entry.name: + continue + if entry.entry_status in (None, "present"): + entry.entry_status = "reordered" + expected_index_entry = section.expected_entries.get(entry.name) + if expected_index_entry and expected_index_entry.entry_status in (None, "present"): + expected_index_entry.entry_status = "reordered" + suggestion = ( + "Reorder entries to maintain alphabetical ordering by name." + ) + message = ( + f"`{entry.raw_name}` appears before `{expected_entry.raw_name}`" + ) + output.add_warning_line( + ValidationIssue.create( + "Incorrect entry order", + index_path, + entry.line_number, + entry.line_number, + message=message, + severity="warning", + suggestion=suggestion, + ).format_message(no_color=output.no_color) + ) + mismatches += 1 + if mismatches >= max_warnings_per_section: + output.add_warning_line( + f"WARNING: Additional ordering issues in '{section_path}' omitted." + ) + break diff --git a/scripts/lib/go_defs_index/_go_defs_index_reporting.py b/scripts/lib/go_defs_index/_go_defs_index_reporting.py index 2196b8d2..e8343dba 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_reporting.py +++ b/scripts/lib/go_defs_index/_go_defs_index_reporting.py @@ -65,6 +65,154 @@ def get_section_display_name(section_name: str) -> str: return section_name +def _emit_added_entries( + output: OutputBuilder, + entries: list, +) -> None: + output.add_blank_line("error") + output.add_line( + f"Found {len(entries)} high-confidence sorted definition(s) not in index:", + section="error", + ) + output.add_blank_line("error") + for entry in sorted(entries, key=lambda item: item.name): + section_str = entry.current_section or "(unresolved)" + confidence_str = ( + f"{int(entry.confidence_score * 100)}%" + if entry.confidence_score is not None + else "N/A" + ) + canonical_str = entry.link_target() if entry.link_file else "(no canonical link)" + output.add_error_line(f" {entry.name}") + output.add_error_line(f" - Kind: {entry.kind}") + if entry.source_file: + output.add_error_line(f" - File: {entry.source_file}:{entry.source_line}") + output.add_error_line( + f" - Suggested section: {section_str} (confidence: {confidence_str})" + ) + output.add_error_line(f" - Canonical location: {canonical_str}") + output.add_blank_line("error") + + +def _emit_orphaned_entries( + output: OutputBuilder, + entries: list, + index_file_name: str, +) -> None: + output.add_line( + f"Found {len(entries)} orphaned entry/entries in index:", + section="error", + ) + output.add_blank_line("error") + for entry in sorted(entries, key=lambda e: e.name): + output.add_error_line( + ValidationIssue.create( + "Orphaned entry", + Path(index_file_name), + entry.line_number, + entry.line_number, + message=f"`{entry.name}` not found in any tech spec file", + severity="error", + ).format_message(no_color=output.no_color) + ) + output.add_blank_line("error") + + +def _emit_moved_entries( + output: OutputBuilder, + entries: list, + parsed_index: "ParsedIndex", + index_file_name: str, +) -> None: + output.add_line( + f"Found {len(entries)} definition(s) in wrong section:", + section="error", + ) + output.add_blank_line("error") + for entry in sorted(entries, key=lambda e: e.name): + current_section = parsed_index.find_section_by_current_entry(entry.name) + current_path = current_section.path_label() if current_section else "(unknown)" + output.add_error_line( + ValidationIssue.create( + "Wrong section", + Path(index_file_name), + entry.line_number, + entry.line_number, + message=f"`{entry.name}` in '{current_path}'", + severity="error", + suggestion=f"Move to '{entry.current_section}'", + ).format_message(no_color=output.no_color) + ) + output.add_blank_line("error") + + +def _emit_link_updates( + output: OutputBuilder, + entries: list, + index_file_name: str, +) -> None: + output.add_line( + f"Found {len(entries)} entry/entries with incorrect links:", + section="error", + ) + output.add_blank_line("error") + for entry in sorted(entries, key=lambda item: item.name): + current_link = entry.link_target() + suggested_link = entry.expected_link_file or "" + if entry.expected_link_anchor: + suggested_link = f"{suggested_link}#{entry.expected_link_anchor}" + output.add_error_line( + ValidationIssue.create( + "Incorrect link", + Path(index_file_name), + entry.line_number, + entry.line_number, + message=f"`{entry.name}`: {current_link}", + severity="error", + suggestion=f"Update to: {suggested_link}", + ).format_message(no_color=output.no_color) + ) + output.add_blank_line("error") + + +def _emit_unresolved_entries( + output: OutputBuilder, + entries: list, +) -> None: + output.add_line( + ( + f"Found {len(entries)} definition(s) with low confidence " + f"(< 75%) not in index:" + ), + section="error", + ) + output.add_blank_line("error") + for entry in sorted(entries, key=lambda item: item.name): + confidence_str = ( + f"{int(entry.confidence_score * 100)}%" + if entry.confidence_score is not None + else "0%" + ) + reasoning_str = ( + ", ".join(entry.confidence_reasoning) + if entry.confidence_reasoning + else "no matches" + ) + suggested = entry.suggested_section or "(unresolved)" + output.add_error_line(f" {entry.name}") + if entry.source_file: + output.add_error_line(f" - File: {entry.source_file}:{entry.source_line}") + output.add_error_line( + f" - Suggested section: {suggested} (confidence: {confidence_str})" + ) + output.add_error_line(f" - Reasoning: {reasoning_str}") + output.add_error_line( + " - Manual review required - confidence too low " + "for automatic placement" + ) + output.add_blank_line("error") + + def generate_report( parsed_index: "ParsedIndex", output: OutputBuilder, @@ -87,140 +235,25 @@ def generate_report( or link_updates or unresolved_entries ) - if not has_issues: + if not has_issues and not output.verbose: return - output.add_errors_header() + if has_issues: + output.add_errors_header() if added_entries: - output.add_blank_line("error") - output.add_line( - f"Found {len(added_entries)} high-confidence sorted definition(s) not in index:", - section="error", - ) - output.add_blank_line("error") - for entry in sorted(added_entries, key=lambda item: item.name): - section_str = entry.current_section or "(unresolved)" - confidence_str = ( - f"{int(entry.confidence_score * 100)}%" - if entry.confidence_score is not None - else "N/A" - ) - canonical_str = entry.link_target() if entry.link_file else "(no canonical link)" - output.add_error_line(f" {entry.name}") - output.add_error_line(f" - Kind: {entry.kind}") - if entry.source_file: - output.add_error_line(f" - File: {entry.source_file}:{entry.source_line}") - output.add_error_line( - f" - Suggested section: {section_str} (confidence: {confidence_str})" - ) - output.add_error_line(f" - Canonical location: {canonical_str}") - output.add_blank_line("error") + _emit_added_entries(output, added_entries) if orphaned_entries: - output.add_line( - f"Found {len(orphaned_entries)} orphaned entry/entries in index:", - section="error", - ) - output.add_blank_line("error") - for entry in sorted(orphaned_entries, key=lambda e: e.name): - output.add_error_line( - ValidationIssue( - "Orphaned entry", - Path(index_file_name), - entry.line_number, - entry.line_number, - f"`{entry.name}` not found in any tech spec file", - severity="error", - ).format_message(no_color=output.no_color) - ) - output.add_blank_line("error") + _emit_orphaned_entries(output, orphaned_entries, index_file_name) if moved_entries: - output.add_line( - f"Found {len(moved_entries)} definition(s) in wrong section:", - section="error", - ) - output.add_blank_line("error") - for entry in sorted(moved_entries, key=lambda e: e.name): - current_section = parsed_index.find_section_by_current_entry(entry.name) - current_path = current_section.path_label() if current_section else "(unknown)" - output.add_error_line( - ValidationIssue( - "Wrong section", - Path(index_file_name), - entry.line_number, - entry.line_number, - f"`{entry.name}` in '{current_path}'", - severity="error", - suggestion=f"Move to '{entry.current_section}'", - ).format_message(no_color=output.no_color) - ) - output.add_blank_line("error") + _emit_moved_entries(output, moved_entries, parsed_index, index_file_name) if link_updates: - output.add_line( - f"Found {len(link_updates)} entry/entries with incorrect links:", - section="error", - ) - output.add_blank_line("error") - for entry in sorted(link_updates, key=lambda item: item.name): - current_link = entry.link_target() - suggested_link = entry.expected_link_file or "" - if entry.expected_link_anchor: - suggested_link = f"{suggested_link}#{entry.expected_link_anchor}" - output.add_error_line( - ValidationIssue( - "Incorrect link", - Path(index_file_name), - entry.line_number, - entry.line_number, - f"`{entry.name}`: {current_link}", - severity="error", - suggestion=f"Update to: {suggested_link}", - ).format_message(no_color=output.no_color) - ) - output.add_blank_line("error") + _emit_link_updates(output, link_updates, index_file_name) if unresolved_entries: - output.add_line( - ( - f"Found {len(unresolved_entries)} definition(s) with low confidence " - f"(< 75%) not in index:" - ), - section="error", - ) - output.add_blank_line("error") - for entry in sorted(unresolved_entries, key=lambda item: item.name): - confidence_str = ( - f"{int(entry.confidence_score * 100)}%" - if entry.confidence_score is not None - else "0%" - ) - reasoning_str = ( - ", ".join(entry.confidence_reasoning) - if entry.confidence_reasoning - else "no matches" - ) - suggested = entry.suggested_section or "(unresolved)" - output.add_error_line(f" {entry.name}") - if entry.source_file: - output.add_error_line(f" - File: {entry.source_file}:{entry.source_line}") - output.add_error_line( - f" - Suggested section: {suggested} (confidence: {confidence_str})" - ) - output.add_error_line(f" - Reasoning: {reasoning_str}") - output.add_error_line( - " - Manual review required - confidence too low " - "for automatic placement" - ) - output.add_blank_line("error") - - output.add_blank_line("error") - output.add_line("Expected index (full tree):", section="error") - output.add_blank_line("error") - for line in parsed_index.render_full_tree(): - output.add_error_line(line) - output.add_blank_line("error") + _emit_unresolved_entries(output, unresolved_entries) # End of module. diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring.py b/scripts/lib/go_defs_index/_go_defs_index_scoring.py index 206e95da..5f015be9 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_scoring.py +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring.py @@ -22,18 +22,53 @@ score_kind_positive_match, score_strict_kind_matching, ) +from lib.go_defs_index._go_defs_index_scoring_rules_core_domain import ( + score_type_name_patterns, +) from lib.go_defs_index._go_defs_index_scoring_rules_methods import ( + score_file_entry_method_categories, score_method_name_preferences, score_method_patterns, score_method_type_classification, ) from lib.go_defs_index._go_defs_index_scoring_rules_penalties import ( + score_error_helper_functions, + score_error_methods, score_error_domain_match, + score_error_context_types, + score_error_context_domain_mismatch, score_general_heuristics, score_hash_optional_types, score_kind_section_map, + score_metadata_tag_helpers, + score_package_config_preference, + score_readonly_package_interface, + score_readonly_type_preference, + score_streaming_helper_functions, + score_streaming_helper_mismatch, score_type_operation_penalty, ) +from lib.go_defs_index._go_defs_index_scoring_rules_type_keywords import ( + score_create_options_preference, + score_error_type_keywords, + score_file_entry_type_keywords, + score_file_info_preference, + score_generic_helper_functions, + score_generic_type_keywords, + score_metadata_type_keywords, + score_other_type_helper_functions, + score_other_types_suffix, + score_recovery_file_header_preference, + score_signature_type_keywords, + score_generic_core_type_preference, + score_package_error_constructor, + score_signature_comment_helpers, + score_security_error_context_types, + score_package_open_helpers, + score_package_read_header_helpers, + score_metadata_destpath_helpers, + score_package_helper_overrides, +) from lib.go_defs_index._go_defs_index_scoring_rules_sections import ( score_camelcase_match, score_comment_domain_match, @@ -112,6 +147,18 @@ def calculate_confidence_score( score += delta reasoning.extend(delta_reasoning) + delta, delta_reasoning = score_type_name_patterns(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_error_context_types(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_error_context_domain_mismatch(ctx) + score += delta + reasoning.extend(delta_reasoning) + delta, delta_reasoning = score_domain_type_subsection(ctx) score += delta reasoning.extend(delta_reasoning) @@ -160,6 +207,10 @@ def calculate_confidence_score( score += delta reasoning.extend(delta_reasoning) + delta, delta_reasoning = score_file_entry_method_categories(ctx) + score += delta + reasoning.extend(delta_reasoning) + delta, delta_reasoning = score_hash_optional_types(ctx) score += delta reasoning.extend(delta_reasoning) @@ -168,6 +219,114 @@ def calculate_confidence_score( score += delta reasoning.extend(delta_reasoning) + delta, delta_reasoning = score_error_methods(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_metadata_tag_helpers(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_error_helper_functions(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_package_config_preference(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_other_types_suffix(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_generic_type_keywords(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_metadata_type_keywords(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_streaming_helper_functions(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_streaming_helper_mismatch(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_generic_helper_functions(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_create_options_preference(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_signature_type_keywords(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_error_type_keywords(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_file_entry_type_keywords(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_file_info_preference(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_recovery_file_header_preference(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_other_type_helper_functions(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_generic_core_type_preference(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_package_error_constructor(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_signature_comment_helpers(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_security_error_context_types(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_package_open_helpers(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_package_read_header_helpers(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_metadata_destpath_helpers(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_package_helper_overrides(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_readonly_type_preference(ctx) + score += delta + reasoning.extend(delta_reasoning) + + delta, delta_reasoning = score_readonly_package_interface(ctx) + score += delta + reasoning.extend(delta_reasoning) + delta, delta_reasoning = score_type_operation_penalty(ctx) score += delta reasoning.extend(delta_reasoning) diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring_domain.py b/scripts/lib/go_defs_index/_go_defs_index_scoring_domain.py index 3aa9b7c0..5960825e 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_scoring_domain.py +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring_domain.py @@ -8,10 +8,7 @@ from typing import List, Optional, Tuple from lib._go_code_utils import normalize_generic_name -from lib.go_defs_index._go_defs_index_config import ( - DOMAIN_FILE_MAP, - KEYWORD_TO_SECTION_MAPPING, -) +from lib.go_defs_index._go_defs_index_config import KEYWORD_TO_SECTION_MAPPING from lib.go_defs_index._go_defs_index_models import DetectedDefinition _DOMAIN_SUFFIXES: Tuple[str, ...] = ("strategy", "builder", "validator") @@ -24,16 +21,28 @@ } _DOMAIN_FALLBACK_KEYWORDS = { "metadata": ["metadata", "comment", "tag", "pathmetadata", "fileentrytag"], - "compression": ["compression", "compress", "decompress"], - "encryption": ["encryption", "encrypt", "decrypt", "aes", "chacha", "mlkem", "cipher"], - "signature": ["signature", "sign"], - "streaming": ["streaming", "stream", "buffer", "chunk"], - "deduplication": ["deduplication", "dedup"], - "package": ["package"], - "concurrency": ["concurrency", "thread", "worker", "safety"], + "compression": ["compression", "compress", "decompress", "lz4", "lzma", "zstd"], + "encryption": [ + "encryption", + "encrypt", + "decrypt", + "aes", + "chacha", + "mlkem", + "cipher", + "key", + ], + "signature": ["signature", "sign", "signing", "certificate", "x509"], + "streaming": ["streaming", "stream", "buffer", "chunk", "pool"], + "deduplication": ["deduplication", "dedup", "hash"], + "package": ["package", "filepackage", "readonly"], + "concurrency": ["concurrency", "thread", "worker", "safety", "job"], "extraction": ["extract", "extraction"], - "creation": ["create", "creation"], - "filetype": ["filetype"], + "creation": ["create", "creation", "new"], + "filetype": ["filetype", "mimetype"], + "error": ["error", "errorcontext", "errtype"], + "fileentry": ["fileentry", "filesource", "fileinfo"], + "generic": ["generic", "option", "result"], } _SECTION_DOMAIN_RULES = [ ("metadata", ["package metadata", "metadata"]), @@ -44,15 +53,67 @@ ("signature", ["signature", "sign"]), ("deduplication", ["deduplication", "dedup"]), ("filetype", ["filetype"]), - ("writing", ["packagewriter", "package writing"]), + ("writing", ["package write methods", "package writing"]), ("package", ["package"]), ] +_FILE_DOMAIN_PATTERNS = [ + # Specific patterns first (more specific wins) + ("file_mgmt_error", "error"), + ("file_mgmt_compression", "compression"), + ("file_mgmt", "fileentry"), + ("file_type", "filetype"), + ("basic_operation", "package"), + ("core", "package"), + ("file_format", "package"), + # Domain keywords in file name + ("compression", "compression"), + ("streaming", "streaming"), + ("security", "encryption"), + ("encryption", "encryption"), + ("signature", "signature"), + ("metadata", "metadata"), + ("deduplication", "deduplication"), + ("generic", "generic"), + ("writing", "writing"), + ("error", "error"), +] + def _name_has_any(name_lower: str, keywords: List[str]) -> bool: return any(keyword in name_lower for keyword in keywords) +def _file_tokens(file_name: str) -> List[str]: + normalized = file_name.lower().replace(".md", "").replace("-", "_") + return [token for token in normalized.split("_") if token] + + +def _tokens_contain_sequence(tokens: List[str], pattern_tokens: List[str]) -> bool: + if not tokens or not pattern_tokens: + return False + start_index = 0 + for token in pattern_tokens: + try: + found_index = tokens.index(token, start_index) + except ValueError: + return False + start_index = found_index + 1 + return True + + +def _domain_from_file_pattern(file_name: str) -> Optional[str]: + """Infer domain from file name using token pattern matching.""" + if not file_name: + return None + tokens = _file_tokens(file_name) + for pattern, domain in _FILE_DOMAIN_PATTERNS: + pattern_tokens = _file_tokens(pattern) + if _tokens_contain_sequence(tokens, pattern_tokens): + return domain + return None + + def _infer_domain_from_section_pattern(pattern: str) -> Optional[str]: pattern_lower = pattern.lower().strip() for domain, keywords in _SECTION_DOMAIN_RULES: @@ -125,8 +186,9 @@ def detect_definition_domain(definition: DetectedDefinition, name_lower: str) -> signature_domain = _domain_from_signature(definition, normalized_lower) if signature_domain: return signature_domain - if definition.file in DOMAIN_FILE_MAP: - return DOMAIN_FILE_MAP[definition.file] + file_domain = _domain_from_file_pattern(definition.file) + if file_domain: + return file_domain generic_domain = _domain_from_generic(definition, normalized_name) if generic_domain: return generic_domain diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring_domain_test.py b/scripts/lib/go_defs_index/_go_defs_index_scoring_domain_test.py new file mode 100644 index 00000000..033d1481 --- /dev/null +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring_domain_test.py @@ -0,0 +1,95 @@ +#!/usr/bin/env python3 +"""Tests for Go defs index scoring helpers.""" + +from __future__ import annotations + +import unittest + +from ._go_defs_index_models import DetectedDefinition +from ._go_defs_index_scoring_domain import _domain_from_file_pattern, detect_definition_domain +from ._go_defs_index_scoring_rules_core_base import ( + ScoringContext, + score_exact_name_match, +) +from ._go_defs_index_scoring_rules_core_domain import score_type_name_patterns +from ._go_defs_index_scoring_rules_penalties import score_error_context_types +from ._go_defs_index_scoring_rules_sections import score_file_patterns + + +def _make_definition(name: str, kind: str, file_name: str) -> DetectedDefinition: + return DetectedDefinition( + name=name, + kind=kind, + file=file_name, + code_block_start_line=1, + code_block_content="", + raw_name=name, + ) + + +def _make_context(definition: DetectedDefinition, section: str) -> ScoringContext: + return ScoringContext( + definition=definition, + section=section, + all_sections=set(), + section_valid_types=None, + section_lower=section.lower(), + name_lower=definition.name.lower(), + heading_lower=definition.heading.lower() if definition.heading else "", + content_lower=definition.section_content.lower(), + detected_domain=None, + ) + + +class ScoringDomainTests(unittest.TestCase): + """Tests for domain detection and file pattern helpers.""" + + def test_domain_from_file_pattern(self) -> None: + cases = { + "api_file_mgmt_error.md": "error", + "api_file_mgmt_compression.md": "compression", + "api_file_mgmt_queries.md": "fileentry", + "api_basic_operations.md": "package", + "package_file_format.md": "package", + "file_type_system.md": "filetype", + } + for file_name, expected in cases.items(): + with self.subTest(file_name=file_name): + self.assertEqual(_domain_from_file_pattern(file_name), expected) + + def test_detect_definition_domain_fallback(self) -> None: + definition = _make_definition("WorkerPool", "type", "api_misc.md") + detected = detect_definition_domain(definition, definition.name.lower()) + self.assertEqual(detected, "concurrency") + + def test_score_file_patterns(self) -> None: + definition = _make_definition("FileEntry", "type", "api_file_mgmt_removal.md") + ctx = _make_context(definition, "4. FileEntry Types") + score, reasoning = score_file_patterns(ctx) + self.assertGreater(score, 0.0) + self.assertTrue(any("File pattern match" in r for r in reasoning)) + + def test_score_type_name_patterns(self) -> None: + definition = _make_definition("CompressionConfig", "type", "api_package_compression.md") + ctx = _make_context(definition, "6. Compression Types") + score, reasoning = score_type_name_patterns(ctx) + self.assertGreater(score, 0.0) + self.assertTrue(any("Type pattern" in r for r in reasoning)) + + def test_score_error_context_types(self) -> None: + definition = _make_definition("CompressionErrorContext", "type", "api_core.md") + ctx = _make_context(definition, "13. Error Types") + score, reasoning = score_error_context_types(ctx) + self.assertGreater(score, 0.0) + self.assertTrue(any("ErrorContext type" in r for r in reasoning)) + + def test_score_exact_name_match_interface_types(self) -> None: + definition = _make_definition("Package", "type", "api_core.md") + ctx = _make_context(definition, "1. Package Interface Types") + score, reasoning = score_exact_name_match(ctx) + self.assertGreater(score, 0.0) + self.assertTrue(any("Exact type name match" in r for r in reasoning)) + + +if __name__ == "__main__": + unittest.main() diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_base.py b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_base.py index 83675bee..ee3f75ca 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_base.py +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_base.py @@ -67,9 +67,14 @@ def _extract_primary_name_from_section(section_name: str) -> str: "definition", "definitions", ] - for suffix in suffixes: - if leaf_lower.endswith(" " + suffix): - leaf_lower = leaf_lower[: -(len(suffix) + 1)].strip() + while True: + removed = False + for suffix in suffixes: + if leaf_lower.endswith(" " + suffix): + leaf_lower = leaf_lower[: -(len(suffix) + 1)].strip() + removed = True + break + if not removed: break leaf_lower = re.sub(r"[^a-z0-9]", "", leaf_lower) return leaf_lower @@ -81,7 +86,7 @@ def _is_core_package_type( receiver_type: Optional[str] = None, ) -> bool: name_lower_check = name.lower() - core_package_types = ["package", "packagereader", "packagewriter", "filepackage"] + core_package_types = ["package", "filepackage"] if name_lower_check in core_package_types: return True if kind == "method" and receiver_type: @@ -210,6 +215,9 @@ def score_exact_name_match(ctx: ScoringContext) -> Tuple[float, List[str]]: if mapped_norm and section_primary and mapped_norm == section_primary: score += 0.60 reasoning.append(f"Exact type name match ({mapped_name}): +60%") + if "interface" in section_leaf_lower: + score += 0.15 + reasoning.append("Interface Types section boosts exact type match: +15%") return score, reasoning @@ -217,6 +225,8 @@ def score_exact_name_match(ctx: ScoringContext) -> Tuple[float, List[str]]: def score_implementation_mapping(ctx: ScoringContext) -> Tuple[float, List[str]]: if ctx.definition.kind not in ("type", "struct"): return 0.0, [] + if "readonly" in ctx.name_lower: + return 0.0, [] score = 0.0 reasoning: List[str] = [] extracted_interface = extract_implementation_interface(ctx.definition) @@ -258,6 +268,8 @@ def _score_error_section_penalties(ctx: ScoringContext) -> Tuple[float, List[str ) if not is_error_section: return 0.0, [], False + if ctx.definition.kind == "method" and "error methods" in ctx.section_lower: + return 0.0, [], False if ctx.definition.kind in ("method", "func"): score -= 0.5 reasoning.append(f"Non-error {ctx.definition.kind} in Error Types section: -50%") diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_domain.py b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_domain.py index e9c3ce43..b6e6c22a 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_domain.py +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_domain.py @@ -124,7 +124,7 @@ def _score_domain_creation(ctx: ScoringContext) -> Tuple[float, List[str]]: def _score_domain_extraction(ctx: ScoringContext) -> Tuple[float, List[str]]: - keywords = ["extract", "fileentry", "packagereader"] + keywords = ["extract", "fileentry"] if not _section_contains_any(ctx.section_lower, keywords): return 0.0, [] msg = "Domain match: extraction-related => Extraction/FileEntry section: +20%" @@ -159,9 +159,12 @@ def _score_domain_streaming(ctx: ScoringContext) -> Tuple[float, List[str]]: def _score_domain_deduplication(ctx: ScoringContext) -> Tuple[float, List[str]]: - if "deduplication" not in ctx.section_lower: + keywords = ["file management", "information and queries", "package file management"] + if not _section_contains_any(ctx.section_lower, keywords): return 0.0, [] - return 0.30, ["Domain match: deduplication-related => Deduplication Types: +30%"] + return 0.20, [ + "Domain match: deduplication-related => Package file management/queries: +20%" + ] def _score_domain_filetype(ctx: ScoringContext) -> Tuple[float, List[str]]: @@ -171,9 +174,9 @@ def _score_domain_filetype(ctx: ScoringContext) -> Tuple[float, List[str]]: def _score_domain_writing(ctx: ScoringContext) -> Tuple[float, List[str]]: - if not _section_contains_any(ctx.section_lower, ["packagewriter", "writing"]): + if not _section_contains_any(ctx.section_lower, ["package write methods", "writing"]): return 0.0, [] - return 0.20, ["Domain match: writing-related => PackageWriter section: +20%"] + return 0.20, ["Domain match: writing-related => Package Write Methods: +20%"] def score_domain_match(ctx: ScoringContext) -> Tuple[float, List[str]]: @@ -200,6 +203,36 @@ def score_domain_match(ctx: ScoringContext) -> Tuple[float, List[str]]: return scorer(ctx) +def score_type_name_patterns(ctx: ScoringContext) -> Tuple[float, List[str]]: + """Score based on type name patterns like *Config, *Builder, *Strategy.""" + if ctx.definition.kind not in ("type", "struct", "interface"): + return 0.0, [] + + name_lower = ctx.name_lower + patterns = { + "config": (["compression", "encryption", "streaming", "signature", "package"], 0.15), + "builder": (["compression", "encryption", "config", "signature", "streaming"], 0.10), + "strategy": (["compression", "encryption", "signature", "streaming"], 0.15), + "validator": (["compression", "encryption", "validation", "signature"], 0.10), + "handler": (["encryption", "file"], 0.10), + "pool": (["buffer", "compression", "resource", "streaming", "worker"], 0.10), + "errorcontext": (["error"], 0.20), + "options": (["file", "package", "compression", "extraction"], 0.10), + "info": (["compression", "file", "package", "signature"], 0.10), + } + + for suffix, (domain_keywords, bonus) in patterns.items(): + if not name_lower.endswith(suffix): + continue + for keyword in domain_keywords: + if keyword in ctx.section_lower: + reason = ( + f"Type pattern '*{suffix}' matches section domain: +{int(bonus * 100)}%" + ) + return bonus, [reason] + return 0.0, [] + + def score_domain_type_subsection(ctx: ScoringContext) -> Tuple[float, List[str]]: if ctx.definition.kind not in ("type", "struct"): return 0.0, [] @@ -224,7 +257,7 @@ def score_domain_type_subsection(ctx: ScoringContext) -> Tuple[float, List[str]] "creation": ["create", "creation"], "generic": ["generic"], "filetype": ["filetype"], - "writing": ["write", "writing", "packagewriter"], + "writing": ["write", "writing"], } for keyword in domain_keywords: if keyword in ctx.name_lower: diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_methods.py b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_methods.py index eea18ce2..267e9e9b 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_methods.py +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_methods.py @@ -95,10 +95,13 @@ def _method_flags(method_name: str) -> Tuple[bool, bool]: def _score_getter_section(ctx: ScoringContext, is_getter: bool) -> Tuple[float, List[str]]: if not is_getter: return 0.0, [] - if "data management" in ctx.section_lower: - msg = "Getter method (Get/Is/Has) matches Data Management Methods: +25%" + if "query methods" in ctx.section_lower: + msg = "Getter method (Get/Is/Has) matches Query Methods: +25%" return 0.25, [msg] - if "transformation" in ctx.section_lower: + if "data methods" in ctx.section_lower: + msg = "Getter method (Get/Is/Has) matches Data Methods: +15%" + return 0.15, [msg] + if "transformation methods" in ctx.section_lower: msg = "Getter method (Get/Is/Has) does not match Transformation Methods: -25%" return -0.25, [msg] return 0.0, [] @@ -110,16 +113,16 @@ def _score_transformation_section( ) -> Tuple[float, List[str]]: if not is_transformation: return 0.0, [] - if "transformation" in ctx.section_lower: + if "transformation methods" in ctx.section_lower: msg = ( "Transformation method (Add/Update/Set/Remove) matches " "Transformation Methods: +25%" ) return 0.25, [msg] - if "data management" in ctx.section_lower: + if "data methods" in ctx.section_lower or "query methods" in ctx.section_lower: msg = ( "Transformation method (Add/Update/Set/Remove) does not match " - "Data Management Methods: -25%" + "Query/Data Methods: -25%" ) return -0.25, [msg] return 0.0, [] @@ -187,3 +190,60 @@ def score_method_name_preferences(ctx: ScoringContext) -> Tuple[float, List[str] "Compression Methods: -20%" ) return score, reasoning + + +def score_file_entry_method_categories(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "method" or "." not in ctx.definition.name: + return 0.0, [] + receiver = ctx.definition.receiver_type or ctx.definition.name.split(".", 1)[0] + if receiver.lower() != "fileentry": + return 0.0, [] + method_lower = ctx.definition.name.split(".", 1)[1].lower() + category_rules = [ + ( + "query methods", + ["get", "has", "is"], + ), + ( + "data methods", + ["getdata", "setdata", "loaddata", "unloaddata", "data"], + ), + ( + "temp file methods", + ["tempfile", "temp"], + ), + ( + "serialization methods", + ["marshal", "writedata", "writemeta", "writeto"], + ), + ( + "path methods", + ["path", "symlink", "associate", "resolve"], + ), + ( + "transformation methods", + [ + "compress", + "decompress", + "encrypt", + "decrypt", + "transform", + "process", + "pipeline", + "set", + "unset", + "current", + "original", + "processingstate", + "validate", + "cleanup", + "resume", + "execute", + "copy", + ], + ), + ] + for section_keyword, tokens in category_rules: + if section_keyword in ctx.section_lower and any(token in method_lower for token in tokens): + return 0.40, [f"FileEntry {section_keyword} method matches section: +40%"] + return 0.0, [] diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_penalties.py b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_penalties.py index f5e86dab..df45e2c1 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_penalties.py +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_penalties.py @@ -51,6 +51,45 @@ def score_error_domain_match(ctx: ScoringContext) -> Tuple[float, List[str]]: return 0.0, [] +def score_error_context_types(ctx: ScoringContext) -> Tuple[float, List[str]]: + """Score error context types to place in Error Types section.""" + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if not ctx.name_lower.endswith("errorcontext"): + return 0.0, [] + domain_keywords = [ + "compression", + "encryption", + "signature", + "security", + "stream", + "metadata", + ] + if any(keyword in ctx.name_lower for keyword in domain_keywords): + return 0.0, [] + if "error" in ctx.section_lower and "type" in ctx.section_lower: + return 0.25, ["ErrorContext type matches Error Types section: +25%"] + return 0.0, [] + + +def score_error_context_domain_mismatch(ctx: ScoringContext) -> Tuple[float, List[str]]: + if not ctx.name_lower.endswith("errorcontext"): + return 0.0, [] + if "error types" not in ctx.section_lower: + return 0.0, [] + domain_keywords = [ + "compression", + "encryption", + "signature", + "security", + "stream", + "metadata", + ] + if any(keyword in ctx.name_lower for keyword in domain_keywords): + return -0.40, ["Domain error context should avoid Error Types: -40%"] + return 0.0, [] + + def _infer_penalty_domain(ctx: ScoringContext) -> Optional[str]: if "compression" in ctx.name_lower or "compress" in ctx.name_lower: return "compression" @@ -114,6 +153,8 @@ def _penalty_error_section(ctx: ScoringContext) -> Tuple[float, List[str], bool] ) if not (is_error_section and not is_error_helper): return 0.0, [], False + if ctx.definition.kind == "method" and "error methods" in ctx.section_lower: + return 0.0, [], False if ctx.definition.kind in ("method", "func"): score -= 0.5 reasoning.append(f"Non-error {ctx.definition.kind} in Error Types section: -50%") @@ -185,13 +226,12 @@ def _kind_section_map_matches(ctx: ScoringContext, kind_mismatch: bool) -> Tuple "interface": [ "Core Interfaces", "Type Definitions", - "Metadata Types", + "Package Metadata Types", "Generic Types", "Compression Types", "Encryption and Security Types", "Signature Types", "Streaming and Buffer Types", - "Deduplication Types", "FileType System Types", ], "method": [ @@ -199,19 +239,24 @@ def _kind_section_map_matches(ctx: ScoringContext, kind_mismatch: bool) -> Tuple "File Management", "Package Writing", "Package Compression", - "Package Metadata Methods", - "Metadata Methods", + "Package Comment Methods", + "Package Identity Methods", + "Package Special File Methods", + "Package Path Metadata Methods", + "Package Symlink Methods", + "Package Metadata-Only Methods", + "Package Info Methods", + "Package Metadata Validation Methods", + "Package Metadata Internal Methods", "Basic Operations", "Security and Encryption Operations", "Digital Signatures", - "Deduplication", "Streaming and Buffer Management", ], - "type": ["Type Definitions", "Metadata Types"], + "type": ["Type Definitions", "Package Metadata Types", "Interface Types"], "func": [ "Basic Operations", - "Metadata Helper Functions", - "Package Metadata Methods", + "Package Metadata Helper Functions", "Package Helper Functions", "File Management", ], @@ -265,3 +310,101 @@ def score_general_heuristics(ctx: ScoringContext) -> Tuple[float, List[str]]: score -= 0.20 reasoning.append("Comment validation is not encryption/security: -20%") return score, reasoning + + +def score_error_methods(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "method": + return 0.0, [] + if "error methods" not in ctx.section_lower: + return 0.0, [] + if "error" not in ctx.name_lower: + return 0.0, [] + return 0.30, ["Error method matches Error Methods section: +30%"] + + +def score_metadata_tag_helpers(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + name_lower = ctx.name_lower + if "fileentry helper functions" in ctx.section_lower: + if "fileentrytag" in name_lower or name_lower == "newtag": + return 0.45, ["FileEntry tag helper matches FileEntry Helpers: +45%"] + return 0.0, [] + if ( + "package metadata helper functions" not in ctx.section_lower + and "metadata helper functions" not in ctx.section_lower + ): + return 0.0, [] + if "pathmetatag" in name_lower: + return 0.45, ["Path metadata tag helper matches Metadata Helpers: +45%"] + return 0.0, [] + + +def score_error_helper_functions(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if "error helper functions" not in ctx.section_lower: + return 0.0, [] + if ctx.name_lower == "newpackageerror": + return 0.0, [] + if "error" not in ctx.name_lower and not ctx.name_lower.startswith("err"): + return 0.0, [] + return 0.45, ["Error helper function matches Error Helper Functions: +45%"] + + +def score_streaming_helper_functions(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if "streaming and buffer helper functions" not in ctx.section_lower: + return 0.0, [] + if ctx.definition.file == "api_streaming.md" or "stream" in ctx.name_lower: + return 0.60, ["Streaming helper function matches Streaming Helper Functions: +60%"] + return 0.0, [] + + +def score_streaming_helper_mismatch(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if ctx.definition.file != "api_streaming.md": + return 0.0, [] + if "compression helper functions" not in ctx.section_lower: + return 0.0, [] + return -0.30, ["Streaming helper function should not be in Compression Helpers: -30%"] + + +def score_package_config_preference(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if ctx.name_lower != "packageconfig": + return 0.0, [] + if "package metadata types" in ctx.section_lower: + return 0.30, ["PackageConfig prefers Package Metadata Types: +30%"] + if "other types" in ctx.section_lower: + return -0.30, ["PackageConfig avoids Other Types: -30%"] + if "package interface types" in ctx.section_lower: + return -0.30, ["PackageConfig avoids Package Interface Types: -30%"] + return 0.0, [] + + +def score_readonly_type_preference(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct", "interface"): + return 0.0, [] + if ctx.name_lower == "readonlypackage": + return 0.0, [] + if "readonly" not in ctx.name_lower: + return 0.0, [] + if "other types" in ctx.section_lower: + return 0.60, ["Read-only types prefer Other Types section: +60%"] + if "package interface" in ctx.section_lower: + return -0.80, ["Read-only types avoid Package Interface Types: -80%"] + return 0.0, [] + + +def score_readonly_package_interface(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct", "interface"): + return 0.0, [] + if ctx.name_lower != "readonlypackage": + return 0.0, [] + if "package interface types" in ctx.section_lower: + return 0.40, ["readOnlyPackage prefers Package Interface Types: +40%"] + return 0.0, [] diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_sections.py b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_sections.py index e8f26c58..031901d4 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_sections.py +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_sections.py @@ -23,52 +23,174 @@ from lib.go_defs_index._go_defs_index_shared import map_implementation_to_interface -def score_file_patterns(ctx: ScoringContext) -> Tuple[float, List[str]]: - file_patterns = { - "api_core.md": [ +_FILE_SECTION_PATTERNS = [ + ("file_mgmt_error", ["Error Types"]), + ("file_mgmt_compression", ["Compression Types", "FileEntry Types"]), + ( + "file_mgmt", + [ + "FileEntry Types", + "FileEntry Query Methods", + "FileEntry Data Methods", + "FileEntry Temp File Methods", + "FileEntry Serialization Methods", + "FileEntry Path Methods", + "FileEntry Transformation Methods", + "FileEntry Helper Functions", + "Tag Methods", + ], + ), + ( + "core", + [ "Package Interface Types", - "PackageReader Interface Types", - "PackageWriter Interface Types", + "Package Lifecycle Methods", + "Package File Management Methods", + "Package Information and Queries Methods", + "Package Comment Methods", + "Package Identity Methods", + "Package Special File Methods", + "Package Path Metadata Methods", + "Package Symlink Methods", + "Package Metadata-Only Methods", + "Package Info Methods", + "Package Metadata Validation Methods", + "Package Metadata Internal Methods", + "Package Compression Methods", + "Package Path and Configuration Methods", + "Package File Encryption Methods", + "Package Signature Management Methods", + "Package Write Methods", + "Package Other Methods", + "Package Helper Functions", "Error Types", ], - "api_basic_operations.md": [ + ), + ( + "basic_operation", + [ "Package Interface Types", - "Package Methods", + "Package Lifecycle Methods", + "Package File Management Methods", + "Package Information and Queries Methods", "Package Helper Functions", ], - "package_file_format.md": [ + ), + ( + "basic_operations", + [ "Package Interface Types", - "Package Methods", + "Package Lifecycle Methods", + "Package File Management Methods", + "Package Information and Queries Methods", "Package Helper Functions", ], - "api_file_management.md": ["FileEntry Types", "FileEntry Methods"], - "api_file_mgmt_index.md": ["FileEntry Types", "FileEntry Methods"], - "api_file_mgmt_file_entry.md": ["FileEntry Types", "FileEntry Methods"], - "api_file_mgmt_addition.md": ["FileEntry Types", "FileEntry Methods"], - "api_file_mgmt_extraction.md": ["FileEntry Types", "FileEntry Methods"], - "api_file_mgmt_removal.md": ["FileEntry Types", "FileEntry Methods"], - "api_file_mgmt_updates.md": ["FileEntry Types", "FileEntry Methods"], - "api_file_mgmt_queries.md": ["FileEntry Types", "FileEntry Methods"], - "api_file_mgmt_compression.md": ["FileEntry Types", "FileEntry Methods"], - "api_metadata.md": ["Package Interface Types", "Package Methods", "Metadata Types"], - "api_package_compression.md": [ + ), + ( + "file_format", + [ "Package Interface Types", - "Package Methods", - "Compression Types", + "Package Information and Queries Methods", + "Package File Management Methods", ], - "api_streaming.md": ["Streaming and Buffer Types"], - "api_security.md": ["Encryption and Security Types"], - "api_generics.md": ["Generic Types"], - "api_writing.md": ["PackageWriter Interface Types", "PackageWriter Methods"], - "api_deduplication.md": ["Deduplication Types"], - "api_signatures.md": ["Signature Types"], - "file_type_system.md": ["FileType System Types"], - } - if ctx.definition.file not in file_patterns: + ), + ("compression", ["Compression Types", "Compression Methods", "Compression Helper Functions"]), + ( + "streaming", + [ + "Streaming and Buffer Types", + "Streaming and Buffer Methods", + "Streaming and Buffer Helper Functions", + ], + ), + ( + "security", + [ + "Encryption and Security Types", + "Encryption and Security Methods", + "Encryption and Security Helper Functions", + ], + ), + ( + "encryption", + [ + "Encryption and Security Types", + "Encryption and Security Methods", + "Encryption and Security Helper Functions", + ], + ), + ("signature", ["Signature Types", "Signature Methods", "Signature Helper Functions"]), + ( + "metadata", + [ + "Package Metadata Types", + "Package Comment Methods", + "Package Identity Methods", + "Package Special File Methods", + "Package Path Metadata Methods", + "Package Symlink Methods", + "Package Metadata-Only Methods", + "Package Info Methods", + "Package Metadata Validation Methods", + "Package Metadata Internal Methods", + "Package Metadata Type Methods", + "Package Metadata Helper Functions", + "Package Interface Types", + ], + ), + ( + "deduplication", + [ + "Package File Management Methods", + "Package Information and Queries Methods", + ], + ), + ("generic", ["Generic Types", "Generic Methods", "Generic Helper Functions"]), + ( + "writing", + [ + "Package Write Methods", + "Package Helper Functions", + ], + ), + ( + "file_type", + ["FileType System Types", "FileType System Methods", "FileType System Helper Functions"], + ), +] + + +def _file_tokens(file_name: str) -> List[str]: + normalized = file_name.lower().replace(".md", "").replace("-", "_") + return [token for token in normalized.split("_") if token] + + +def _tokens_contain_sequence(tokens: List[str], pattern_tokens: List[str]) -> bool: + if not tokens or not pattern_tokens: + return False + start_index = 0 + for token in pattern_tokens: + try: + found_index = tokens.index(token, start_index) + except ValueError: + return False + start_index = found_index + 1 + return True + + +def score_file_patterns(ctx: ScoringContext) -> Tuple[float, List[str]]: + if not ctx.definition.file: return 0.0, [] - for pattern in file_patterns[ctx.definition.file]: - if pattern.lower() in ctx.section_lower or ctx.section_lower in pattern.lower(): - return 0.15, [f"File pattern match ({ctx.definition.file}): +15%"] + section_leaf = ctx.section_lower.split(">")[-1].strip() + file_tokens = _file_tokens(ctx.definition.file) + for pattern, section_keywords in _FILE_SECTION_PATTERNS: + pattern_tokens = _file_tokens(pattern) + if not _tokens_contain_sequence(file_tokens, pattern_tokens): + continue + for section_keyword in section_keywords: + keyword_lower = section_keyword.lower() + if keyword_lower in section_leaf or section_leaf in keyword_lower: + return 0.15, [f"File pattern match ({ctx.definition.file}): +15%"] return 0.0, [] @@ -182,7 +304,7 @@ def score_comment_domain_match(ctx: ScoringContext) -> Tuple[float, List[str]]: "creation": ["create", "creation"], "generic": ["generic"], "filetype": ["filetype"], - "writing": ["write", "writing", "packagewriter"], + "writing": ["write", "writing"], } score = 0.0 reasoning: List[str] = [] diff --git a/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_type_keywords.py b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_type_keywords.py new file mode 100644 index 00000000..33070f5c --- /dev/null +++ b/scripts/lib/go_defs_index/_go_defs_index_scoring_rules_type_keywords.py @@ -0,0 +1,395 @@ +""" +Type and helper-function keyword scoring rules. +""" + +from __future__ import annotations + +from typing import List, Tuple + +from lib.go_defs_index._go_defs_index_scoring_rules_core import ScoringContext + + +def score_other_types_suffix(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if "other types" not in ctx.section_lower: + return 0.0, [] + if ctx.name_lower in { + "addfileoptions", + "extractpathoptions", + "removedirectoryoptions", + "fileinfo", + "filemetadataupdate", + "fileindex", + "indexentry", + "createoptions", + "packageconfig", + "pathhandling", + "destpathspec", + "symlinkconvertoptions", + "transformpipeline", + "transformstage", + "transformtype", + "tag", + "tagvaluetype", + "recoveryfileheader", + }: + return 0.0, [] + suffixes = [ + "options", + "config", + "info", + "entry", + "type", + "spec", + "handling", + "pipeline", + "index", + "header", + "rule", + "strategy", + "builder", + "worker", + "pool", + ] + if any(ctx.name_lower.endswith(suffix) for suffix in suffixes): + return 0.55, ["Type suffix matches Other Types section: +55%"] + return 0.0, [] + + +def score_generic_type_keywords(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if "generic types" not in ctx.section_lower: + return 0.0, [] + if ctx.name_lower in { + "addfileoptions", + "extractpathoptions", + "removedirectoryoptions", + "fileinfo", + "filemetadataupdate", + "createoptions", + "packageconfig", + "pathhandling", + "destpathspec", + "symlinkconvertoptions", + "fileindex", + "indexentry", + "transformpipeline", + "transformstage", + "transformtype", + "tag", + "tagvaluetype", + "recoveryfileheader", + }: + return 0.0, [] + keywords = [ + "config", + "builder", + "option", + "optional", + "result", + "strategy", + "validator", + "worker", + "rule", + "pool", + "job", + "thread", + "pathentry", + ] + if any(keyword in ctx.name_lower for keyword in keywords): + return 0.30, ["Generic type keyword matches Generic Types: +30%"] + return 0.0, [] + + +def score_metadata_type_keywords(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if ( + "package metadata types" not in ctx.section_lower + and "metadata types" not in ctx.section_lower + ): + return 0.0, [] + if ctx.name_lower in { + "pathentry", + "addfileoptions", + "extractpathoptions", + "removedirectoryoptions", + "fileinfo", + "filemetadataupdate", + "tag", + "tagvaluetype", + "transformpipeline", + "transformstage", + "transformtype", + }: + return 0.0, [] + keywords = [ + "metadata", + "manifest", + "index", + "signaturedata", + "signatureinfo", + "packageconfig", + "pathhandling", + "createoptions", + "destpathspec", + "symlinkconvertoptions", + "fileindex", + "indexentry", + "pathmetadata", + "pathinfo", + "pathstats", + "pathnode", + "pathtree", + "pathfilesystem", + "pathinheritance", + "pathmetadatapatch", + "pathmetadatatype", + "pathmetadataentry", + ] + if any(keyword in ctx.name_lower for keyword in keywords): + return 0.30, ["Metadata type keyword matches Package Metadata Types: +30%"] + return 0.0, [] + + +def score_signature_type_keywords(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if "signature types" not in ctx.section_lower: + return 0.0, [] + if ctx.name_lower in {"signaturedata", "signatureinfo"}: + return 0.0, [] + if ctx.name_lower in {"unsupportederrorcontext", "validationerrorcontext"}: + return 0.50, ["Signature error context matches Signature Types: +50%"] + if ctx.name_lower == "signingkey": + return 0.30, ["Signature-adjacent type matches Signature Types: +30%"] + if "signature" in ctx.name_lower: + return 0.30, ["Signature type keyword matches Signature Types: +30%"] + return 0.0, [] + + +def score_error_type_keywords(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if "error types" not in ctx.section_lower: + return 0.0, [] + allowed = { + "errortype", + "packageerror", + "packageerrorcontext", + "ioerrorcontext", + "patternerrorcontext", + "readonlyerrorcontext", + } + if ctx.name_lower in allowed: + return 0.30, ["Error type keyword matches Error Types: +30%"] + return 0.0, [] + + +def score_file_entry_type_keywords(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if "fileentry types" not in ctx.section_lower: + return 0.0, [] + keywords = [ + "fileentry", + "filesource", + "hash", + "optionaldata", + "processingstate", + "addfileoptions", + "extractpathoptions", + "removedirectoryoptions", + "fileinfo", + "filemetadataupdate", + "tag", + "tagvaluetype", + "transformpipeline", + "transformstage", + "transformtype", + ] + if any(keyword in ctx.name_lower for keyword in keywords): + return 0.30, ["FileEntry type keyword matches FileEntry Types: +30%"] + return 0.0, [] + + +def score_file_info_preference(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if ctx.name_lower != "fileinfo": + return 0.0, [] + if "fileentry types" in ctx.section_lower: + return 0.40, ["FileInfo prefers FileEntry Types: +40%"] + if "package interface types" in ctx.section_lower: + return -0.40, ["FileInfo avoids Package Interface Types: -40%"] + return 0.0, [] + + +def score_recovery_file_header_preference(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if ctx.name_lower != "recoveryfileheader": + return 0.0, [] + if "package interface types" in ctx.section_lower: + return 0.40, ["RecoveryFileHeader prefers Package Interface Types: +40%"] + return 0.0, [] + + +def score_security_error_context_types(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if ctx.name_lower not in {"securityerrorcontext", "encryptionerrorcontext"}: + return 0.0, [] + if "encryption and security types" in ctx.section_lower: + return 0.20, ["Security error context matches Security Types: +20%"] + return 0.0, [] + + +def score_generic_helper_functions(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if "generic helper functions" not in ctx.section_lower: + return 0.0, [] + helper_names = { + "err", + "ok", + "processconcurrently", + "composevalidators", + "validateall", + "validatewith", + } + if ctx.definition.file == "api_generics.md" or ctx.name_lower in helper_names: + return 0.50, ["Generic helper function matches Generic Helpers: +50%"] + return 0.0, [] + + +def score_other_type_helper_functions(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if "other type helper functions" not in ctx.section_lower: + return 0.0, [] + if ctx.name_lower == "newpackagewithoptions": + return 0.0, [] + if ctx.name_lower.startswith("new") and "options" in ctx.name_lower: + return 0.60, ["Options constructor matches Other Type Helpers: +60%"] + return 0.0, [] + + +def score_generic_core_type_preference(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + generic_core = { + "config", + "configbuilder", + "strategy", + "validationrule", + "validator", + "workerpool", + "pathentry", + } + if ctx.name_lower not in generic_core: + return 0.0, [] + if "generic types" in ctx.section_lower: + return 0.30, ["Generic core type prefers Generic Types: +30%"] + if "other types" in ctx.section_lower: + return -0.30, ["Generic core type avoids Other Types: -30%"] + return 0.0, [] + + +def score_package_error_constructor(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if ctx.name_lower != "newpackageerror": + return 0.0, [] + if "error helper functions" in ctx.section_lower: + return 0.60, ["NewPackageError prefers Error Helper Functions: +60%"] + if "package helper functions" in ctx.section_lower: + return -0.60, ["NewPackageError avoids Package Helper Functions: -60%"] + return 0.0, [] + + +def score_signature_comment_helpers(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if ( + "package metadata helper functions" not in ctx.section_lower + and "metadata helper functions" not in ctx.section_lower + ): + return 0.0, [] + if "signaturecomment" in ctx.name_lower: + return 0.20, ["Signature comment helper matches Metadata Helpers: +20%"] + return 0.0, [] + + +def score_package_open_helpers(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if "package helper functions" not in ctx.section_lower: + return 0.0, [] + if ctx.name_lower.startswith("open"): + return 0.15, ["Open helper prefers Package Helper Functions: +15%"] + return 0.0, [] + + +def score_package_read_header_helpers(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if "package helper functions" not in ctx.section_lower: + return 0.0, [] + if ctx.name_lower == "readheader": + return 0.20, ["ReadHeader prefers Package Helper Functions: +20%"] + return 0.0, [] + + +def score_metadata_header_helpers(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if ( + "package metadata helper functions" not in ctx.section_lower + and "metadata helper functions" not in ctx.section_lower + ): + return 0.0, [] + if ctx.name_lower == "readheaderfrompath": + return 0.40, ["ReadHeaderFromPath matches Metadata Helpers: +40%"] + return 0.0, [] + + +def score_metadata_destpath_helpers(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if ( + "package metadata helper functions" not in ctx.section_lower + and "metadata helper functions" not in ctx.section_lower + ): + return 0.0, [] + if ctx.name_lower == "setdestpath": + return 0.40, ["SetDestPath matches Metadata Helpers: +40%"] + return 0.0, [] + + +def score_package_helper_overrides(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind != "func": + return 0.0, [] + if "package helper functions" not in ctx.section_lower: + return 0.0, [] + if ctx.name_lower == "newpackagewithoptions": + return 0.40, ["NewPackageWithOptions prefers Package Helper Functions: +40%"] + if ctx.name_lower == "readheaderfrompath": + return 0.20, ["ReadHeaderFromPath prefers Package Helper Functions: +20%"] + return 0.0, [] + + +def score_create_options_preference(ctx: ScoringContext) -> Tuple[float, List[str]]: + if ctx.definition.kind not in ("type", "struct"): + return 0.0, [] + if ctx.name_lower != "createoptions": + return 0.0, [] + if "package metadata types" in ctx.section_lower: + return 0.30, ["CreateOptions prefers Package Metadata Types: +30%"] + if "other types" in ctx.section_lower: + return -0.30, ["CreateOptions avoids Other Types: -30%"] + if "generic types" in ctx.section_lower: + return -0.30, ["CreateOptions avoids Generic Types: -30%"] + return 0.0, [] diff --git a/scripts/lib/go_defs_index/_go_defs_index_shared.py b/scripts/lib/go_defs_index/_go_defs_index_shared.py index 77482169..edc3c2d5 100644 --- a/scripts/lib/go_defs_index/_go_defs_index_shared.py +++ b/scripts/lib/go_defs_index/_go_defs_index_shared.py @@ -11,8 +11,6 @@ "filePackage": "Package", "readOnlyPackage": "Package", "readOnlyPackageImpl": "Package", - "packageReader": "PackageReader", - "packageWriter": "PackageWriter", } diff --git a/scripts/lib/go_markdown/__init__.py b/scripts/lib/go_markdown/__init__.py new file mode 100644 index 00000000..c93d92ec --- /dev/null +++ b/scripts/lib/go_markdown/__init__.py @@ -0,0 +1,67 @@ +""" +Go code block and signature utilities for markdown. + +Re-exports from _base and _rest for backward compatibility. +""" + +from lib.go_markdown._base import ( + EXAMPLE_MARKERS, + EXAMPLE_NAME_PREFIXES, + Signature, + determine_type_kind, + extract_go_doc_comment_above, + extract_receiver_type, + find_go_code_blocks, + is_example_code, + is_example_signature_name, + is_in_go_code_block, + is_public_name, + normalize_generic_name, + parse_go_def_signature, + remove_go_comments, +) +from lib.go_markdown._rest import ( + InterfaceParser, + check_kind_word_after, + count_go_definitions, + extract_interfaces_from_go_file, + extract_interfaces_from_markdown, + find_definition_line_index, + find_first_definition, + is_continuation_line, + is_definition_start_line, + is_example_definition, + is_signature_only_code_block, + normalize_go_signature, + normalize_go_signature_with_params, +) + +__all__ = [ + "EXAMPLE_MARKERS", + "EXAMPLE_NAME_PREFIXES", + "Signature", + "InterfaceParser", + "check_kind_word_after", + "count_go_definitions", + "determine_type_kind", + "extract_go_doc_comment_above", + "extract_interfaces_from_go_file", + "extract_interfaces_from_markdown", + "extract_receiver_type", + "find_go_code_blocks", + "find_first_definition", + "find_definition_line_index", + "is_continuation_line", + "is_definition_start_line", + "is_example_code", + "is_example_definition", + "is_example_signature_name", + "is_in_go_code_block", + "is_public_name", + "is_signature_only_code_block", + "normalize_generic_name", + "normalize_go_signature", + "normalize_go_signature_with_params", + "parse_go_def_signature", + "remove_go_comments", +] diff --git a/scripts/lib/go_markdown/_base.py b/scripts/lib/go_markdown/_base.py new file mode 100644 index 00000000..aa656b30 --- /dev/null +++ b/scripts/lib/go_markdown/_base.py @@ -0,0 +1,994 @@ +#!/usr/bin/env python3 +""" +Shared utilities for parsing and processing Go code blocks in markdown files. + +This module provides common functions for: +- Detecting and extracting Go code blocks from markdown +- Parsing Go function, method, and type signatures +- Normalizing Go signatures and type names +- Detecting example code (single lines and entire code blocks) +""" + +import re +from dataclasses import dataclass +from typing import List, Optional, Tuple + +from lib._validation_utils import find_heading_for_code_block + + +# Example detection markers +EXAMPLE_MARKERS = [ + 'hypothetical', 'not the actual', 'this is not', 'not a real', + 'example only', 'example type', 'example interface', 'example struct', + 'example version', 'example pattern', 'illustration only', + "not an actual", "shown for illustration" +] + +EXAMPLE_NAME_PREFIXES = ('Example', 'Hypothetical', 'Mock', 'Test') + +# Compiled regex patterns for performance +_RE_TYPE_NAME = re.compile(r'^\s*type\s+(\w+)') +_RE_FUNC_NAME = re.compile(r'^\s*func\s+(?:\([^)]+\)\s+)?(\w+)') +_RE_GO_COMMENT_SINGLE = re.compile(r'//.*$') +_RE_GO_COMMENT_SINGLE_MULTILINE = re.compile(r'//.*$', re.MULTILINE) +_RE_GO_COMMENT_MULTI = re.compile(r'/\*.*?\*/', flags=re.DOTALL) +_RE_GO_DOC_LINE = re.compile(r'^\s*//\s?(.*)$') +_RE_GO_BLOCK_COMMENT_START = re.compile(r'^\s*/\*\s?(.*)$') +_RE_GO_BLOCK_COMMENT_END = re.compile(r'^(.*)\*/\s*$') +_RE_INTERFACE_PATTERN = re.compile(r'^\s*(?:type\s+)?(\w+)(?:\s*\[[^\]]+\])?\s+interface\s*\{') +_RE_STRUCT_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+struct\s*\{') +_RE_ALIAS_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s*=\s') +_RE_POINTER_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+\*') +_RE_SLICE_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+\[\]') +_RE_MAP_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+map\s*\[') +_RE_TYPE_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*\[[^\]]+\])?\s+\S') +_RE_AFTER_TYPE_PATTERN = re.compile(r'^\s*type\s+\w+(?:\s*\[[^\]]+\])?\s+(.+)') +_RE_REMOVE_BRACE = re.compile(r'\s*\{.*$') +_RE_METHOD_PATTERN = re.compile( + r'^\s*func\s+(\([^)]+\))\s+(\w+)(?:\s*\[[^\]]+\])?\s*\(([^)]*)\)\s*(.*)$' +) +_RE_FUNC_PATTERN = re.compile(r'^\s*func\s+(\w+)(?:\s*\[[^\]]+\])?\s*\(([^)]*)\)\s*(.*)$') +_RE_RECEIVER_TYPE = re.compile(r'^\s*(?:\w+\s+)?(?:\*)?\s*(\w+(?:\[[^\]]+\])?)') +_RE_WHITESPACE = re.compile(r'\s+') +_RE_GENERIC_PARAMS = re.compile(r'\[[^\]]+\]') +_RE_PACKAGE_TYPE = re.compile(r'\b([a-z][a-z0-9_]*(?:\.[a-z][a-z0-9_]*)*)\.([A-Z][A-Za-z0-9_]*)\b') +_RE_METHOD_NORMALIZE = re.compile(r'func\s+(\([^)]+\))\s+(\w+)\s*\(([^)]*)\)\s*(.*)$') +_RE_FUNC_NORMALIZE = re.compile(r'func\s+(\w+)\s*\(([^)]*)\)\s*(.*)$') +_RE_FUNC_WITH_PARAMS = re.compile(r'func\s+(?:\([^)]+\)\s+)?(\w+)\s*\(([^)]*)\)\s*(.*)$') +_RE_RECEIVER_MATCH = re.compile(r'func\s+(\([^)]+\))\s+') +_RE_WHITESPACE_NORMALIZE = re.compile(r'\s+') +_RE_GENERICS_TAG = re.compile(r'\bgenerics\.(Tag|TagValueType|PathEntry)\b') +_RE_METADATA_TYPES = re.compile( + r'\bmetadata\.(PackageMetadata|PackageInfo|FileEntry|PathMetadataEntry|ProcessingState)\b' +) +_RE_FILEFORMAT_TYPES = re.compile(r'\bfileformat\.(PackageHeader|FileIndex|IndexEntry)\b') +_RE_HEADER_TYPE = re.compile(r'\bHeader\b') +_RE_PKGERRORS_TYPES = re.compile(r'\bpkgerrors\.(ErrorType|PackageError)\b') +_RE_SIGNATURES_TYPES = re.compile(r'\bsignatures\.(Signature|SignatureInfo)\b') +_RE_FUNC_TYPE_DEF = re.compile(r'^\s*type\s+\w+(?:\s*\[[^\]]+\])?\s+func\s*\(') + + +def find_go_code_blocks(content: str) -> List[Tuple[int, int, str]]: + """ + Find all Go code blocks in markdown content. + + Args: + content: Markdown content as string + + Returns: + List of tuples: (start_line, end_line, code_content) + Lines are 1-indexed. + """ + go_blocks = [] + lines = content.split('\n') + + i = 0 + while i < len(lines): + line = lines[i] + + # Check for Go code block start + if line.strip() == '```go': + start_line = i + 1 # 1-indexed for reporting + code_lines = [] + i += 1 + + # Collect code until closing ``` + while i < len(lines) and lines[i].strip() != '```': + code_lines.append(lines[i]) + i += 1 + + if i < len(lines): # Found closing ``` + code_content = '\n'.join(code_lines) + go_blocks.append((start_line, i + 1, code_content)) + + i += 1 + + return go_blocks + + +def is_in_go_code_block(content: str, line_num: int) -> bool: + """ + Check if a given line number is inside a Go code block. + + Args: + content: Markdown content as string + line_num: Line number to check (1-indexed) + + Returns: + True if the line is inside a ```go code block + """ + lines = content.split('\n') + in_go_block = False + + for _idx, line in enumerate(lines[:line_num], 1): + if line.strip() == '```go': + in_go_block = True + elif line.strip() == '```' and in_go_block: + in_go_block = False + + return in_go_block + + +def _heading_suggests_example(heading_text: Optional[str]) -> bool: + """Return True if heading text suggests example code.""" + if not heading_text: + return False + heading_lower = heading_text.lower() + if any(marker in heading_lower for marker in EXAMPLE_MARKERS): + return True + return "example" in heading_lower + + +def _prose_line_suggests_example(prose_lower: str) -> bool: + """Return True if a prose line suggests example code.""" + if any(marker in prose_lower for marker in EXAMPLE_MARKERS): + return True + if "example" not in prose_lower: + return False + example_phrases = [ + "this is an example", "example:", "example code", "example only", + "example type", "example interface" + ] + if any(phrase in prose_lower for phrase in example_phrases): + return True + example_markers = ["example code", "example type", "example interface"] + if "for example" in prose_lower and not any(m in prose_lower for m in example_markers): + return False + return False + + +def _prose_before_block_suggests_example( + lines: List[str], + start_line: int, + check_prose_before_block: bool, +) -> bool: + """Return True if prose immediately before the code block suggests example.""" + if not check_prose_before_block or start_line <= 1: + return False + prose_start = max(0, start_line - 11) + prose_end = start_line - 1 + for j in range(prose_start, prose_end): + if j < 0 or j >= len(lines): + continue + line = lines[j] + stripped = line.strip() + if not stripped or stripped in ('```', '```go') or stripped.startswith('#'): + continue + if _prose_line_suggests_example(line.lower()): + return True + return False + + +def _comment_line_suggests_example(prev_lower: str) -> bool: + """Return True if a code/comment line suggests example.""" + if any(marker in prev_lower for marker in EXAMPLE_MARKERS): + return True + if "example" not in prev_lower: + return False + comment_phrases = ["// example", "// example:", "example code", "example only"] + if any(phrase in prev_lower for phrase in comment_phrases): + return True + return "for example" not in prev_lower + + +def _name_suggests_example(stripped: str) -> bool: + """Return True if type or function name suggests example.""" + type_match = _RE_TYPE_NAME.match(stripped) + if type_match and type_match.group(1).startswith(EXAMPLE_NAME_PREFIXES): + return True + func_match = _RE_FUNC_NAME.match(stripped) + return bool(func_match and func_match.group(1).startswith(EXAMPLE_NAME_PREFIXES)) + + +def _code_line_suggests_example( + lines: List[str], + line_index: int, + start_line: int, +) -> bool: + """Return True if this code line or its preceding lines suggest example.""" + for j in range(max(start_line - 1, line_index - 5), line_index): + if j < 0 or j >= len(lines): + continue + if _comment_line_suggests_example(lines[j].lower()): + return True + line = lines[line_index] if line_index < len(lines) else '' + return _name_suggests_example(line.strip()) + + +def _get_lines_to_check( + code: str, + start_line: int, + lines: List[str], + check_single_line: Optional[int], + max_lines_to_check: int, +) -> Optional[List[int]]: + """Return list of line indices to check for example markers, or None if invalid.""" + if check_single_line is not None: + line_index = start_line - 1 + check_single_line + if line_index < 0 or line_index >= len(lines): + return None + return [line_index] + code_lines = code.split('\n') + if not code_lines: + return None + result = [] + for i, code_line in enumerate(code_lines[:max_lines_to_check]): + if code_line.strip() and not code_line.strip().startswith('```'): + line_idx = start_line - 1 + i + if line_idx < len(lines): + result.append(line_idx) + return result + + +def is_example_code( + code: str, + start_line: int, + *, + content: Optional[str] = None, + lines: Optional[List[str]] = None, + heading_text: Optional[str] = None, + auto_find_heading: bool = False, + check_prose_before_block: bool = True, + check_single_line: Optional[int] = None, + max_lines_to_check: int = 5, +) -> bool: + """ + Check if Go code is example code. + + This unified function can check: + - A single line within a code block + - Multiple lines in a code block (default: first 5 lines) + - An entire code block + + Looks for example markers in: + - The heading above the code block (if provided or auto-found) + - Prose text immediately before the code block (if check_prose_before_block is True) + - Previous lines within the code block + - The name of the type/function definition + + Args: + code: The code block content (without ```go markers) OR a single line + start_line: Line number where the code block starts (1-indexed) + content: Full markdown content (preferred - used for heading finding and prose checking) + lines: All lines of the file as a list + (alternative to content; content will be derived if needed) + heading_text: Optional heading text above the code block + auto_find_heading: If True, automatically find heading from content + check_prose_before_block: If True, check prose text between heading and code block + check_single_line: If provided (0-indexed line number within code), check only that line + max_lines_to_check: Maximum number of lines to check in code block (default: 5) + + Returns: + True if the code appears to be example code + """ + if content is None and lines is not None: + content = '\n'.join(lines) + if lines is None: + lines = content.split('\n') if content else [''] * (start_line - 1) + code.split('\n') + if auto_find_heading and content: + heading_text = find_heading_for_code_block(content, start_line) + if not lines: + return False + if _heading_suggests_example(heading_text): + return True + if _prose_before_block_suggests_example(lines, start_line, check_prose_before_block): + return True + lines_to_check = _get_lines_to_check( + code, start_line, lines, check_single_line, max_lines_to_check + ) + if not lines_to_check: + return False + for line_index in lines_to_check: + if line_index >= start_line - 1 and _code_line_suggests_example( + lines, line_index, start_line + ): + return True + return False + + +def is_example_signature_name(name: str) -> bool: + """ + Check if a signature name indicates it's an example. + + Args: + name: Signature name to check + + Returns: + True if the name starts with example prefixes + """ + return name.startswith(EXAMPLE_NAME_PREFIXES) + + +def remove_go_comments(text: str, multiline: bool = False) -> str: + """ + Remove Go comments from text (single or multi-line). + + Args: + text: Go code text + multiline: If True, handles multi-line strings and block comments. + If False, also strips whitespace (for single-line usage). + + Returns: + Text with comments removed (and stripped if multiline=False) + """ + if multiline: + text = _RE_GO_COMMENT_SINGLE_MULTILINE.sub('', text) + text = _RE_GO_COMMENT_MULTI.sub('', text) + else: + text = _RE_GO_COMMENT_SINGLE.sub('', text) + # For single-line usage, strip whitespace (matches original behavior) + text = text.strip() + return text + + +def _collect_block_comment_lines(code_lines, i, end_match, should_skip): + """Collect multi-line block comment parts upward; return (block_text, new_i).""" + block_parts: List[str] = [] + end_text = end_match.group(1).strip() + if end_text and not should_skip(end_text): + block_parts.insert(0, end_text) + i -= 1 + while i >= 0: + raw2 = code_lines[i].rstrip("\n") + stripped2 = raw2.strip() + start_match = _RE_GO_BLOCK_COMMENT_START.match(stripped2) + if start_match: + start_text = start_match.group(1).strip() + if start_text and start_text != "*/": + cleaned = start_text.replace("*/", "").strip() + if cleaned and not should_skip(cleaned): + block_parts.insert(0, cleaned) + break + if stripped2: + if stripped2.startswith("*"): + stripped2 = stripped2[1:].strip() + cleaned = stripped2.replace("*/", "").strip() + if cleaned and not should_skip(cleaned): + block_parts.insert(0, cleaned) + i -= 1 + block_text = " ".join([p for p in block_parts if p]).strip() + return (block_text, i) + + +def extract_go_doc_comment_above( + code_lines: List[str], + definition_line_index: int, +) -> str: + """ + Extract doc comment text immediately above a definition line. + + This is an additive helper intended for future scoring improvements. + + Args: + code_lines: List of Go code lines (no markdown fences). + definition_line_index: 0-based index of the definition line within code_lines. + + Returns: + Normalized doc comment text, or empty string if none. + """ + if not code_lines: + return "" + if definition_line_index <= 0 or definition_line_index > len(code_lines) - 1: + return "" + + # Walk upward collecting contiguous comment lines / blocks. + collected: List[str] = [] + i = definition_line_index - 1 + + def _should_skip_doc_line(text: str) -> bool: + # Skip TODO/FIXME lines, but keep other doc comment content. + t = (text or "").strip() + if not t: + return True + upper = t.upper() + return upper.startswith("TODO:") or upper.startswith("FIXME:") + + while i >= 0: + raw = code_lines[i].rstrip("\n") + stripped = raw.strip() + + if not stripped: + # Allow blank lines between comment lines but stop if we already started + # collecting and then hit a blank line (doc comments must be adjacent). + if collected: + break + i -= 1 + continue + + # Single-line doc comment: // ... + m = _RE_GO_DOC_LINE.match(raw) + if m: + text = m.group(1).strip() + if text and not _should_skip_doc_line(text): + collected.insert(0, text) + i -= 1 + continue + + # Inline block comment: /* ... */ + if "/*" in stripped and "*/" in stripped: + inner = _RE_GO_COMMENT_MULTI.sub(lambda mm: mm.group(0)[2:-2], stripped) + inner = inner.strip() + if inner and not _should_skip_doc_line(inner): + collected.insert(0, inner) + i -= 1 + continue + + end_match = _RE_GO_BLOCK_COMMENT_END.match(stripped) + if end_match and "/*" not in stripped: + block_text, i = _collect_block_comment_lines( + code_lines, i, end_match, _should_skip_doc_line + ) + if block_text: + collected.insert(0, block_text) + i -= 1 + continue + + # Not a comment line; stop. + break + + # Normalize whitespace. + out = " ".join(collected).strip() + out = _RE_WHITESPACE_NORMALIZE.sub(" ", out) + return out + + +def determine_type_kind(line: str) -> Optional[str]: + """ + Determine the kind of a Go type definition from a line. + + This function extracts the kind ('interface', 'struct', 'alias', or 'type') from a Go type + definition line. It checks interfaces first, then structs, then type aliases, then other types. + + Args: + line: Line of Go code + + Returns: + 'interface', 'struct', 'alias', 'pointer', 'slice', 'map', 'type', + or None if not a type definition + + Examples: + - "type Package interface {" -> 'interface' + - "type FileEntry struct {" -> 'struct' + - "type ProcessingState uint8" -> 'type' + - "type Option[T] struct {" -> 'struct' + - "type Name = SomeType" -> 'alias' (type alias) + - "type Name[T] = SomeType[T]" -> 'alias' (generic type alias) + - "type Name *SomeType" -> 'pointer' (pointer type) + - "type Name []SomeType" -> 'slice' (slice type) + - "type Name map[K]V" -> 'map' (map type) + - "type Name SomeType" -> 'type' (regular type definition) + """ + line_clean = remove_go_comments(line) + + # Check for interface definitions FIRST (before type definitions) + # This ensures interfaces are correctly classified, not as types + # Pattern: type Name interface { or Name interface { + interface_match = _RE_INTERFACE_PATTERN.match(line_clean) + if interface_match: + return 'interface' + + # Check for struct definitions (distinct from other types) + # Pattern: type Name struct { or type Name[T] struct { + struct_match = _RE_STRUCT_PATTERN.match(line_clean) + if struct_match: + return 'struct' + + # Check for type aliases: type Name = Type or type Name[T] = Type + # This must be checked before other type definitions + alias_match = _RE_ALIAS_PATTERN.match(line_clean) + if alias_match: + return 'alias' + + # Check for pointer types: type Name *SomeType or type Name[T] *SomeType + pointer_match = _RE_POINTER_PATTERN.match(line_clean) + if pointer_match: + return 'pointer' + + # Check for slice types: type Name []SomeType or type Name[T] []SomeType + slice_match = _RE_SLICE_PATTERN.match(line_clean) + if slice_match: + return 'slice' + + # Check for map types: type Name map[K]V or type Name[T] map[K]V + map_match = _RE_MAP_PATTERN.match(line_clean) + if map_match: + return 'map' + + # Check for other type definitions + # (custom types, etc. - excludes structs, interfaces, aliases, pointers, slices, maps) + # Pattern: type Name SomeType or type Name[T] SomeType + # This handles regular type definitions (may or may not have generics) + type_match = _RE_TYPE_PATTERN.match(line_clean) + if type_match: + # Make sure it's not already matched by struct/interface/alias/pointer/slice/map patterns + # and it's not a function type + # Check that it doesn't start with pointer, slice, or map patterns + if ('struct' not in line_clean and 'interface' not in line_clean + and '=' not in line_clean and 'func(' not in line_clean): + # Check if it's a pointer, slice, or map (already handled above) + # by checking if the pattern after type name matches those + after_type_match = _RE_AFTER_TYPE_PATTERN.match(line_clean) + if after_type_match: + after_type = after_type_match.group(1).strip() + # If it doesn't start with *, [], or map[, it's a regular type + if (not after_type.startswith('*') + and not after_type.startswith('[]') + and not after_type.startswith('map[')): + return 'type' + + return None + + +def parse_go_def_signature(line: str, location: str = "") -> Optional[Signature]: + """ + Parse a Go definition signature from a line (function, method, or type). + + Args: + line: Line of Go code + location: Optional location string (file path and line number) + + Returns: + Signature object or None if no definition found + - For functions/methods: kind='func' or 'method', includes params and returns + - For types: kind='type', 'interface', 'struct', etc., includes generic_params + """ + line_clean = remove_go_comments(line) + + # Try to parse as function/method first + # Remove opening brace if present + line_no_brace = _RE_REMOVE_BRACE.sub('', line_clean).strip() + + # Method: func (r *Receiver) Name(params) returns + method_match = _RE_METHOD_PATTERN.match(line_no_brace) + if method_match: + receiver_str = method_match.group(1) + name = method_match.group(2) + params = method_match.group(3) + returns = method_match.group(4).strip() + receiver_type = extract_receiver_type(receiver_str) + return Signature( + name=name, + kind='method', + receiver=receiver_type, + params=params, + returns=returns, + location=location, + is_public=is_public_name(name) + ) + + # Function: func Name(params) returns + func_match = _RE_FUNC_PATTERN.match(line_no_brace) + if func_match: + name = func_match.group(1) + params = func_match.group(2) + returns = func_match.group(3).strip() + return Signature( + name=name, + kind='func', + params=params, + returns=returns, + location=location, + is_public=is_public_name(name) + ) + + # Try to parse as type definition + kind = determine_type_kind(line_clean) + if kind is not None: + # Special-case: interfaces may be written as: + # - type Name interface { ... } + # - Name interface { ... } + # + # determine_type_kind() supports both forms, but the generic type match below + # only matches "type Name ...", so handle interface explicitly. + if kind == 'interface': + interface_match = re.match( + r'^\s*(?:type\s+)?(\w+)(?:\s*(\[[^\]]+\]))?\s+interface\s*\{', + line_clean, + ) + if interface_match: + name = interface_match.group(1) + generic_params = interface_match.group(2) # e.g., "[T any]" + return Signature( + name=name, + kind='interface', + generic_params=generic_params, + location=location, + is_public=is_public_name(name), + ) + + # Extract name and generic parameters + type_match = re.match( + r'^\s*type\s+(\w+)(?:\s*(\[[^\]]+\]))?\s+', + line_clean + ) + if type_match: + name = type_match.group(1) + generic_params = type_match.group(2) # e.g., "[T any]" + return Signature( + name=name, + kind=kind, # 'type', 'interface', 'struct', 'alias', etc. + generic_params=generic_params, + location=location, + is_public=is_public_name(name) + ) + + return None + + +def extract_receiver_type(receiver_str: str, normalize_generics: bool = False) -> str: + """ + Extract the type name from a receiver string. + + Args: + receiver_str: Receiver string like "(r *Receiver)" or "(o *Option[T])" or just "Package" + normalize_generics: If True, remove generic parameters from the type name + + Returns: + Type name (e.g., "Receiver" or "Option") + """ + # If already just a type name (starts with capital, no parentheses), return as-is + if receiver_str and receiver_str[0].isupper() and '(' not in receiver_str: + if normalize_generics: + return normalize_generic_name(receiver_str) + return receiver_str + + # Remove parentheses if present + receiver_clean = receiver_str.strip('()').strip() + + # Pattern: variableName *TypeName or variableName TypeName + # Also handle: *TypeName (no variable name) + match = _RE_RECEIVER_TYPE.match(receiver_clean) + if match: + type_name = match.group(1) + if normalize_generics: + return normalize_generic_name(type_name) + return type_name + + # Fallback: split by spaces and take last part + parts = receiver_clean.split() + if len(parts) >= 2: + # Has variable name, last part is type + type_name = parts[-1] + if normalize_generics: + return normalize_generic_name(type_name) + return type_name + if len(parts) == 1: + # Single word - could be type name or pointer + single_word = parts[0] + # Remove leading * if present + if single_word.startswith('*'): + type_name = single_word[1:] + else: + type_name = single_word + if normalize_generics: + return normalize_generic_name(type_name) + return type_name + + # If single word and starts with capital, it's likely already a type name + if receiver_clean and receiver_clean[0].isupper(): + # Remove leading * if present + if receiver_clean.startswith('*'): + type_name = receiver_clean[1:] + else: + type_name = receiver_clean + if normalize_generics: + return normalize_generic_name(type_name) + return type_name + + return receiver_clean + + +def normalize_generic_name(name: str) -> str: + """ + Normalize a generic type name by removing generic parameters. + + Args: + name: Type name that may include generics (e.g., "Option[T]", "BufferPool[T any]") + + Returns: + Base type name without generics (e.g., "Option", "BufferPool") + + Examples: + - "Option[T]" -> "Option" + - "BufferPool[T]" -> "BufferPool" + - "ConfigBuilder[T]" -> "ConfigBuilder" + - "Option" -> "Option" (no change if no generics) + - "Container[Option[T]]" -> "Container" (handles nested generics) + - "Type[]" -> "Type" (handles empty brackets) + """ + # Remove generic parameters like [T], [T any], [T, U], [], etc. + # Handle nested brackets by repeatedly removing the rightmost bracket pair + # This ensures we remove innermost brackets first + result = name + while True: + # Find the rightmost [ that has a matching ] + # We'll work backwards to find balanced brackets + last_open = result.rfind('[') + if last_open == -1: + break # No more brackets + + # Find the matching closing bracket + bracket_count = 0 + found_close = False + for i in range(last_open, len(result)): + if result[i] == '[': + bracket_count += 1 + elif result[i] == ']': + bracket_count -= 1 + if not bracket_count: + # Found matching bracket, remove this bracket pair + result = result[:last_open] + result[i + 1:] + found_close = True + break + + if not found_close: + # Unmatched bracket, just remove the [ + result = result[:last_open] + result[last_open + 1:] + + return result + + +def _normalize_go_signature_preprocessing( + sig_str: str, use_whitespace_normalize: bool = False +) -> str: + """Common preprocessing for signature normalization. + + Args: + sig_str: Go signature string + use_whitespace_normalize: If True, use _RE_WHITESPACE_NORMALIZE, + else _RE_WHITESPACE + + Returns: + Preprocessed signature string + """ + # Remove comments + sig_str = remove_go_comments(sig_str, multiline=True) + + # Normalize whitespace + if use_whitespace_normalize: + sig_str = _RE_WHITESPACE_NORMALIZE.sub(' ', sig_str) + else: + sig_str = _RE_WHITESPACE.sub(' ', sig_str) + sig_str = sig_str.strip() + + # Remove generic type parameters + sig_str = _RE_GENERIC_PARAMS.sub('', sig_str) + + return sig_str + + +def _normalize_package_names_general(sig_str: str) -> str: + """Normalize package-qualified type names to short names (general approach). + + Pattern: package.Type -> Type + Only normalizes internal NovusPack packages, not standard library packages. + """ + standard_lib_packages = { + 'context', 'errors', 'fmt', 'io', 'os', 'strings', 'bytes', 'time', + 'sync', 'reflect', 'encoding', 'encoding/json', 'encoding/binary', + 'crypto', 'net', 'path', 'path/filepath', 'syscall', 'unicode', + 'math', 'sort', 'strconv', 'bufio', 'compress', 'archive', 'hash' + } + + def replace_package_type(match): + full_match = match.group(0) + package_part = match.group(1) + type_name = match.group(2) + + # Check if this is a standard library package + base_package = ( + package_part.split('.')[0] if '.' in package_part else package_part + ) + if base_package in standard_lib_packages: + return full_match # Keep standard library types as-is + + # For internal packages, return just the type name + return type_name + + return _RE_PACKAGE_TYPE.sub(replace_package_type, sig_str) + + +def _normalize_package_names_specific(sig_str: str) -> str: + """Normalize package names using specific regex substitutions. + + For sync validation. Handles re-exported types: generics.X -> X, + metadata.X -> X, etc. + """ + sig_str = _RE_GENERICS_TAG.sub(r'\1', sig_str) + sig_str = _RE_METADATA_TYPES.sub(r'\1', sig_str) + sig_str = _RE_FILEFORMAT_TYPES.sub(r'\1', sig_str) + sig_str = _RE_HEADER_TYPE.sub('PackageHeader', sig_str) + sig_str = _RE_PKGERRORS_TYPES.sub(r'\1', sig_str) + sig_str = _RE_SIGNATURES_TYPES.sub(r'\1', sig_str) + return sig_str + + +def _normalize_returns_simple( + returns: str, normalize_param_list_func +) -> Tuple[str, bool]: + """Normalize return values - simple approach (removes names). + + Returns: + Tuple of (normalized_returns, has_multiple_returns) + """ + normalized_returns = "" + has_multiple_returns = False + if returns: + returns_stripped = returns.strip() + if returns_stripped.startswith('(') and returns_stripped.endswith(')'): + returns_content = returns_stripped[1:-1].strip() + normalized_returns = normalize_param_list_func(returns_content) + has_multiple_returns = True + elif ',' in returns_stripped: + normalized_returns = normalize_param_list_func(returns_stripped) + has_multiple_returns = True + else: + normalized_returns = normalize_param_list_func(returns_stripped) + has_multiple_returns = False + return normalized_returns, has_multiple_returns + + +def _extract_receiver_type_safe(receiver_str: str) -> str: + """Extract receiver type safely, handling both (Type) and (var *Type) formats.""" + receiver_clean = receiver_str.strip('()').strip() + # Check if it's already just a type name (single word, starts with capital) + if (len(receiver_clean.split()) == 1 and receiver_clean and + receiver_clean[0].isupper()): + return receiver_clean + + # Has variable name, extract type + receiver_type = extract_receiver_type(receiver_str, normalize_generics=False) + # Fallback: if extraction failed, try to get last word + if not receiver_type or receiver_type == receiver_str.strip('()'): + parts = receiver_clean.split() + if len(parts) >= 2: + receiver_type = parts[-1] # Last part is the type + else: + receiver_type = receiver_clean + return receiver_type + + +def _format_normalized_signature( + name: str, + normalized_params: str, + normalized_returns: str, + *, + receiver_type: Optional[str] = None, + has_multiple_returns: bool = False, + always_paren_returns: bool = False, +) -> str: + """ + Format a normalized Go signature string from its components. + + Args: + name: Function/method name + normalized_params: Normalized parameter list (types only) + normalized_returns: Normalized return values + receiver_type: Receiver type (if method, None for function) + has_multiple_returns: Whether there are multiple return values + (for simple normalization) + always_paren_returns: If True, always use parentheses for returns + (for param-preserving normalization) + + Returns: + Formatted signature string + """ + # Format return values + if normalized_returns: + # Determine if we should use parentheses + use_parens = always_paren_returns or has_multiple_returns + if use_parens: + returns_str = f"({normalized_returns})" + else: + returns_str = normalized_returns + else: + returns_str = "" + + # Format the signature + if receiver_type: + # Method with receiver + if returns_str: + return f"func ({receiver_type}) {name}({normalized_params}) {returns_str}" + return f"func ({receiver_type}) {name}({normalized_params})" + # Function without receiver + if returns_str: + return f"func {name}({normalized_params}) {returns_str}" + return f"func {name}({normalized_params})" + + +@dataclass(frozen=True) +class Signature: + """ + Represents a Go function, method, or type signature. + + This is a shared dataclass used across multiple validation scripts. + Optional fields allow different scripts to track additional information + as needed. + """ + name: str + kind: str # 'func', 'method', 'type', 'interface' + receiver: Optional[str] = None # For methods: the receiver type + params: str = "" # Parameter list as string + returns: str = "" # Return types as string + location: str = "" # File path and line number + is_public: bool = True # Whether it's exported (starts with capital) + # Optional fields for scripts that need more detail + has_body: bool = False # Whether this is a full definition with body + method_count: int = 0 # For interfaces: number of methods in body + field_count: int = 0 # For structs: number of fields in body + generic_params: Optional[str] = None # Generic parameters like "[T any]" or None + + def normalized_key(self) -> str: + """Generate a normalized key for comparison.""" + if self.kind == 'method' and self.receiver: + return f"{self.receiver}.{self.name}" + if self.kind in ('type', 'interface') and self.generic_params: + # Include generics in key to distinguish SigningKey from SigningKey[T] + return f"{self.name}{self.generic_params}" + return self.name + + def normalized_signature(self) -> str: + """Generate a normalized signature string for comparison.""" + # Normalize whitespace and remove comments + params = _RE_WHITESPACE_NORMALIZE.sub(' ', self.params.strip()) + returns = _RE_WHITESPACE_NORMALIZE.sub(' ', self.returns.strip()) + + if self.kind == 'method': + return f"func ({self.receiver}) {self.name}({params}) ({returns})" + if self.kind == 'func': + return f"func {self.name}({params}) ({returns})" + if self.kind == 'type': + return f"type {self.name}" + if self.kind == 'interface': + return f"type {self.name} interface" + return f"{self.kind} {self.name}" + + def normalized_type_name(self) -> str: + """ + Get normalized type name (without generics for display purposes). + + For types with generics, returns just the base name. + For other types, returns the name as-is. + """ + if self.generic_params: + return normalize_generic_name(self.name) + return self.name + + def is_method(self) -> bool: + """Check if this is a method (has receiver).""" + return self.kind == 'method' and self.receiver is not None + + +def is_public_name(name: str) -> bool: + """ + Check if a name is public (exported) in Go. + + In Go, exported identifiers start with an uppercase letter. + + Args: + name: The name to check + + Returns: + True if the name is public (starts with uppercase letter) + """ + return bool(name and name[0].isupper()) diff --git a/scripts/lib/go_markdown/_rest.py b/scripts/lib/go_markdown/_rest.py new file mode 100644 index 00000000..5f86b1af --- /dev/null +++ b/scripts/lib/go_markdown/_rest.py @@ -0,0 +1,921 @@ +"""Go code block discovery, signatures, interfaces, definitions (part 2).""" + +import re +from pathlib import Path +from typing import Dict, List, Optional, Tuple + +from lib.go_markdown._base import ( + Signature, + _RE_FUNC_NORMALIZE, + _RE_FUNC_TYPE_DEF, + _RE_FUNC_WITH_PARAMS, + _RE_METHOD_NORMALIZE, + _RE_RECEIVER_MATCH, + _extract_receiver_type_safe, + _format_normalized_signature, + _normalize_go_signature_preprocessing, + _normalize_package_names_general, + _normalize_package_names_specific, + _normalize_returns_simple, + extract_receiver_type, + find_go_code_blocks, + is_example_code, + is_public_name, + parse_go_def_signature, + remove_go_comments, +) + + +def normalize_go_signature(sig_str: str) -> str: + """ + Normalize a Go signature string for comparison. + + Removes comments, normalizes whitespace, standardizes package names, + and extracts receiver types properly. + + Args: + sig_str: Go signature string + + Returns: + Normalized signature string + """ + # Common preprocessing + sig_str = _normalize_go_signature_preprocessing(sig_str, use_whitespace_normalize=False) + + # Normalize package names (general approach) + sig_str = _normalize_package_names_general(sig_str) + + # Normalize parameter list (remove parameter names, keep types) + def normalize_param_list(param_str: str) -> str: + if not param_str.strip(): + return "" + # Simple normalization: remove parameter names + # Pattern: name Type -> Type + params = [] + for param in param_str.split(','): + param = param.strip() + parts = param.split() + if len(parts) >= 2: + # Has name and type: keep type part + params.append(' '.join(parts[1:])) + else: + params.append(param) + return ", ".join(params) + + # Extract and normalize function signatures + # Handle method with receiver - receiver can be in format (Type) or (var *Type) + method_match = _RE_METHOD_NORMALIZE.match(sig_str) + if method_match: + receiver_str = method_match.group(1) + name = method_match.group(2) + params = method_match.group(3) + returns = method_match.group(4).strip() + + # Extract receiver type + receiver_type = _extract_receiver_type_safe(receiver_str) + + normalized_params = normalize_param_list(params) + + # Normalize return values + normalized_returns, has_multiple_returns = _normalize_returns_simple( + returns, normalize_param_list + ) + + # Format signature using shared helper + return _format_normalized_signature( + name=name, + normalized_params=normalized_params, + normalized_returns=normalized_returns, + receiver_type=receiver_type, + has_multiple_returns=has_multiple_returns, + always_paren_returns=False + ) + + # Check for function without receiver + func_match = _RE_FUNC_NORMALIZE.match(sig_str) + if func_match: + name = func_match.group(1) + params = func_match.group(2) + returns = func_match.group(3).strip() + + normalized_params = normalize_param_list(params) + + # Normalize return values + normalized_returns, has_multiple_returns = _normalize_returns_simple( + returns, normalize_param_list + ) + + # Format signature using shared helper + return _format_normalized_signature( + name=name, + normalized_params=normalized_params, + normalized_returns=normalized_returns, + receiver_type=None, + has_multiple_returns=has_multiple_returns, + always_paren_returns=False + ) + + return sig_str + + +def normalize_go_signature_with_params(sig_str: str) -> str: + """ + Normalize a Go signature string for comparison while preserving parameter names. + + This is a specialized version for sync validation that handles shorthand + notation and keeps parameter names for exact matching. Use this when you need + to compare signatures where parameter names must match exactly. + + The general-purpose `normalize_go_signature()` removes parameter names and + is better suited for general signature normalization. + + Normalizes: + - Extra whitespace + - Comments + - Generic type parameters (for comparison purposes) + - Package name differences (generics.X vs X) + + Keeps: + - Parameter names (must match exactly) + - Return value names (must match exactly) + """ + # Common preprocessing + sig_str = _normalize_go_signature_preprocessing(sig_str, use_whitespace_normalize=True) + + # Normalize package names (specific approach for sync validation) + sig_str = _normalize_package_names_specific(sig_str) + + # Remove parameter names, keep only types + # Pattern: name Type -> Type + # Handle: ctx context.Context, path string -> context.Context, string + # Handle: offset, size int64 -> int64, int64 + + def _is_parameter_name_only(param: str) -> bool: + """Check if parameter looks like just a name (no type indicators).""" + return not any(c in param for c in [' ', '.', '*', '[', ']', '(', ')']) + + def _can_split_normalized_param(normalized: str) -> bool: + """Check if normalized parameter can be safely split by comma.""" + return (',' in normalized and + not any(c in normalized for c in ['*[', '[]', 'map['])) + + def _process_param_token(param: str, normalize_single_param_func) -> List[Tuple[str, str]]: + """Process a single parameter token, returning list of (tag, value) tuples.""" + if not param: + return [] + + if _is_parameter_name_only(param): + # Just a name, might be part of shorthand - keep it for later processing + return [('name', param)] + + normalized = normalize_single_param_func(param) + if _can_split_normalized_param(normalized): + return [('type', p.strip()) for p in normalized.split(',')] + return [('type', normalized)] + + def _process_last_param_with_shorthand( + param: str, + params: List[Tuple[str, str]], + normalize_single_param_func + ) -> None: + """Process the last parameter, handling shorthand notation.""" + if not param: + return + + # Check if previous params ended with names (shorthand pattern) + if params and params[-1][0] == 'name': + # This is the type for the shorthand names + type_part = normalize_single_param_func(param) + # Replace all trailing 'name' entries with this type + i = len(params) - 1 + while i >= 0 and params[i][0] == 'name': + params[i] = ('type', type_part) + i -= 1 + else: + tokens = _process_param_token(param, normalize_single_param_func) + params.extend(tokens) + + def _resolve_remaining_names( + params: List[Tuple[str, str]] + ) -> List[str]: + """Resolve any remaining name entries to types, handling edge cases.""" + final_params = [] + i = 0 + while i < len(params): + if params[i][0] == 'name': + # Collect consecutive names + names = [params[i][1]] + i += 1 + while i < len(params) and params[i][0] == 'name': + names.append(params[i][1]) + i += 1 + # If next is a type, use it; otherwise these are invalid + if i < len(params) and params[i][0] == 'type': + type_part = params[i][1] + final_params.extend([type_part] * len(names)) + i += 1 + else: + # Invalid - just use the names as-is (shouldn't happen) + final_params.extend(names) + else: + final_params.append(params[i][1]) + i += 1 + return final_params + + def normalize_param_list(param_str: str) -> str: + if not param_str.strip(): + return "" + # Split parameters by comma, but be careful with nested structures + params = [] + current = "" + paren_depth = 0 + bracket_depth = 0 + + for char in param_str: + if char == '(': + paren_depth += 1 + elif char == ')': + paren_depth -= 1 + elif char == '[': + bracket_depth += 1 + elif char == ']': + bracket_depth -= 1 + elif char == ',' and not paren_depth and not bracket_depth: + # Found a top-level comma separator + param = current.strip() + if param: + tokens = _process_param_token(param, normalize_single_param) + params.extend(tokens) + current = "" + continue + current += char + + # Process last param + if current.strip(): + _process_last_param_with_shorthand( + current.strip(), params, normalize_single_param + ) + + # Handle any remaining name entries (shouldn't happen in valid Go, but handle gracefully) + final_params = _resolve_remaining_names(params) + return ", ".join(final_params) + + def _is_type_like(param: str) -> bool: + """Check if parameter looks like a type (starts with type indicators).""" + return param and (param.startswith('*') or param.startswith('[') or param[0].isupper()) + + def _extract_type_from_shorthand(parts: List[str]) -> str: + """Extract type from shorthand notation (e.g., 'offset, size int64').""" + type_part = parts[-1] + first_part = parts[0] + name_list = [n.strip() for n in first_part.split(',')] + # Return type repeated for each name + return ", ".join([type_part] * len(name_list)) + + def _extract_type_from_regular(parts: List[str]) -> str: + """Extract type from regular notation (e.g., 'name Type' or 'name *package.Type').""" + if len(parts) == 2: + return parts[-1] + # Multiple words: might be name *package.Type + # Remove first word (the name) + return " ".join(parts[1:]) + + def normalize_single_param(param: str) -> str: + """Normalize a single parameter, handling shorthand notation.""" + # Remove leading parameter names + # Pattern: name Type or name1, name2 Type + # Handle: offset, size int64 -> int64 (expand to int64, int64) + + parts = param.split() + if len(parts) < 2: + # Single identifier - might be just a type or just a name + if _is_type_like(param): + return param + # Otherwise, it's probably just a name - return as-is (caller will handle) + return param + + # Check if first part has commas (shorthand) + first_part = parts[0] + if ',' in first_part: + # Shorthand: offset, size int64 + return _extract_type_from_shorthand(parts) + # Regular: name Type - remove the name, keep the type + return _extract_type_from_regular(parts) + + # Extract and normalize function signatures + # Pattern: func Name(params) returns or func (r Receiver) Name(params) returns + func_match = _RE_FUNC_WITH_PARAMS.match(sig_str) + if func_match: + name = func_match.group(1) + params = func_match.group(2) + returns = func_match.group(3).strip() + + normalized_params = normalize_param_list(params) + # For returns, keep names and types - they must match exactly + # Expand shorthand in returns too + normalized_returns = normalize_param_list(returns) if returns else "" + + # Reconstruct using shared helper + receiver_match = _RE_RECEIVER_MATCH.match(sig_str) + if receiver_match: + receiver = receiver_match.group(1) + receiver_type = extract_receiver_type(receiver) + return _format_normalized_signature( + name=name, + normalized_params=normalized_params, + normalized_returns=normalized_returns, + receiver_type=receiver_type, + has_multiple_returns=False, # Not used when always_paren_returns=True + always_paren_returns=True + ) + # For functions without receiver, always use parentheses for returns + # (this matches the behavior expected by sync validation) + return _format_normalized_signature( + name=name, + normalized_params=normalized_params, + normalized_returns=normalized_returns, + receiver_type=None, + has_multiple_returns=False, # Not used when always_paren_returns=True + always_paren_returns=True + ) + + return sig_str + + +class InterfaceParser: + """ + Helper class for parsing Go interfaces with brace depth tracking. + + This handles the common pattern of tracking interface definitions + and their methods across multiple scripts. + """ + + def __init__(self): + self.in_interface = False + self.current_interface: Optional[str] = None + self.brace_depth = 0 + + def reset(self): + """Reset the parser state.""" + self.in_interface = False + self.current_interface = None + self.brace_depth = 0 + + def check_interface_start(self, line: str) -> Optional[str]: + """ + Check if a line starts an interface definition. + + Args: + line: The line to check + + Returns: + Interface name if this line starts an interface, None otherwise + """ + # Pattern: type Name interface { or type Name[T] interface { + interface_match = re.match( + r'^\s*type\s+(\w+)(?:\s*(\[[^\]]+\]))?\s+interface\s*\{', line + ) + if interface_match: + self.in_interface = True + self.current_interface = interface_match.group(1) + stripped = line.strip() + self.brace_depth = stripped.count('{') - stripped.count('}') + return self.current_interface + return None + + def update_brace_depth(self, line: str) -> bool: + """ + Update brace depth for current interface. + + Args: + line: The current line + + Returns: + True if still inside interface, False if interface closed + """ + if not self.in_interface: + return False + + stripped = line.strip() + self.brace_depth += stripped.count('{') - stripped.count('}') + + if self.brace_depth <= 0: + self.in_interface = False + self.current_interface = None + return False + + return True + + def is_in_interface(self) -> bool: + """Check if currently parsing an interface.""" + return self.in_interface + + def get_current_interface(self) -> Optional[str]: + """Get the name of the current interface being parsed.""" + return self.current_interface + + +def extract_interfaces_from_go_file( + file_path: Path, + parse_methods: bool = True +) -> List[Signature]: + """ + Extract all interfaces (and optionally their methods) from a Go source file. + + Args: + file_path: Path to the Go source file + parse_methods: If True, also extract interface methods as separate signatures + + Returns: + List of Signature objects for interfaces (and their methods if parse_methods=True) + """ + interfaces = [] + methods = [] + + try: + resolved_path = file_path.resolve() + content = file_path.read_text(encoding='utf-8') + lines = content.split('\n') + + interface_parser = InterfaceParser() + + for line_num, line in enumerate(lines, 1): + stripped = line.strip() + + # Skip empty lines and comments + if not stripped or stripped.startswith('//'): + continue + + # Check for interface start using InterfaceParser + interface_name = interface_parser.check_interface_start(line) + if interface_name: + is_public = is_public_name(interface_name) if interface_name else False + interfaces.append(Signature( + name=interface_name, + kind='interface', + location=f"{resolved_path}:{line_num}", + is_public=is_public + )) + continue + + # Track interface brace depth using InterfaceParser + if interface_parser.is_in_interface(): + # Check brace depth before updating to catch methods on closing line + brace_depth_before = interface_parser.brace_depth + current_interface = interface_parser.get_current_interface() + still_in_interface = interface_parser.update_brace_depth(line) + + # Check for interface method if we're still in interface or on closing line + if parse_methods and ( + (still_in_interface and interface_parser.brace_depth > 0) or + (brace_depth_before > 0 and not still_in_interface and '{' not in stripped) + ): + sig = parse_go_def_signature(line, location=f"{resolved_path}:{line_num}") + if sig and sig.kind in ('func', 'method'): + # Interface methods don't have receivers in the interface definition + # but we track them as methods of the interface type + methods.append(Signature( + name=sig.name, + kind='method', + receiver=current_interface, + params=sig.params, + returns=sig.returns, + location=f"{resolved_path}:{line_num}", + is_public=sig.is_public + )) + + if not still_in_interface: + # Interface closed, parser already reset + pass + continue + + except (OSError, UnicodeDecodeError, ValueError, KeyError) as e: + print(f"Warning: Error reading {file_path}: {e}") + + # Return interfaces first, then methods + return interfaces + methods + + +def _maybe_append_interface_method( + line, line_num, resolved_path, interface_parser, *, + parse_methods, methods, stripped +): + """If line is an interface method in body, append Signature to methods.""" + brace_depth_before = interface_parser.brace_depth + current_interface = interface_parser.get_current_interface() + still_in_interface = interface_parser.update_brace_depth(line) + if not parse_methods: + return + if not ( + (still_in_interface and interface_parser.brace_depth > 0) or + (brace_depth_before > 0 and not still_in_interface and '{' not in stripped) + ): + return + sig = parse_go_def_signature(line, location=f"{resolved_path}:{line_num}") + if sig and sig.kind in ('func', 'method'): + methods.append(Signature( + name=sig.name, + kind='method', + receiver=current_interface, + params=sig.params, + returns=sig.returns, + location=f"{resolved_path}:{line_num}", + is_public=sig.is_public, + has_body=False + )) + + +def _count_interface_methods(block_lines, start_index, interface_parser) -> int: + """Count method signatures in interface body from block_lines starting at start_index.""" + method_count = 0 + temp_brace_depth = interface_parser.brace_depth + for j in range(start_index + 1, len(block_lines)): + temp_line = block_lines[j] + temp_stripped = temp_line.strip() + if not temp_stripped or temp_stripped.startswith('//'): + continue + temp_brace_depth += temp_stripped.count('{') - temp_stripped.count('}') + temp_sig = parse_go_def_signature(temp_line, location="") + if (temp_sig and temp_sig.kind in ('func', 'method') and temp_brace_depth > 0): + method_count += 1 + if temp_brace_depth <= 0: + break + return method_count + + +def extract_interfaces_from_markdown( + content: str, + file_path: Path, + *, + start_line: int = 1, + parse_methods: bool = True, + skip_examples: bool = True, + lines: Optional[List[str]] = None, +) -> List[Signature]: + """ + Extract all interfaces (and optionally their methods) from Go code blocks in markdown. + + Args: + content: Markdown content as string + file_path: Path to the markdown file (for location strings) + start_line: Starting line number for the content (default: 1) + parse_methods: If True, also extract interface methods as separate signatures + skip_examples: If True, skip interfaces that are marked as examples + lines: Optional list of all lines in the file (for example detection) + + Returns: + List of Signature objects for interfaces (and their methods if parse_methods=True) + """ + _ = start_line # reserved for future use + interfaces = [] + methods = [] + + try: + resolved_path = file_path.resolve() + go_blocks = find_go_code_blocks(content) + + for block_start_line, _block_end_line, code_content in go_blocks: + # Process each code block + block_lines = code_content.split('\n') + interface_parser = InterfaceParser() + + for i, line in enumerate(block_lines): + # Calculate actual line number in file (1-indexed) + line_num = block_start_line + i + + stripped = line.strip() + + # Skip empty lines and comments + if not stripped or stripped.startswith('//'): + continue + + # Check if this is an example signature + is_example = False + if skip_examples: + is_example = is_example_code( + code_content, block_start_line, + lines=lines, + check_single_line=i + ) + + # Check for interface start using InterfaceParser + interface_name = interface_parser.check_interface_start(line) + if interface_name: + # Skip if this is an example + if is_example: + continue + + # Extract generic parameters from the line + generic_match = re.match( + r'^\s*type\s+\w+\s*(\[[^\]]+\])?\s+interface\s*\{', line + ) + generic_params = generic_match.group(1) if generic_match else None + is_public = is_public_name(interface_name) if interface_name else False + + # Check if this is a stub (interface without body or minimal body) + has_full_body = interface_parser.brace_depth > 0 + + method_count = ( + _count_interface_methods(block_lines, i, interface_parser) + if (has_full_body and parse_methods) + else 0 + ) + interfaces.append(Signature( + name=interface_name, + kind='interface', + location=f"{resolved_path}:{line_num}", + is_public=is_public, + has_body=has_full_body, + method_count=method_count, + generic_params=generic_params + )) + continue + + if interface_parser.is_in_interface(): + _maybe_append_interface_method( + line, line_num, resolved_path, interface_parser, + parse_methods=parse_methods, methods=methods, stripped=stripped + ) + continue + + except (OSError, UnicodeDecodeError, ValueError, KeyError) as e: + print(f"Warning: Error processing interfaces from {file_path}: {e}") + + # Return interfaces first, then methods + return interfaces + methods + + +def find_definition_line_index(code: str, is_type: bool) -> Optional[int]: + """ + Find the line index (0-indexed) of the first definition in code. + + Args: + code: Go code block content + is_type: True to find type definition, False to find function definition + + Returns: + Line index (0-indexed) of the definition, or None if not found + """ + lines = code.split('\n') + for i, line in enumerate(lines): + stripped = line.strip() + # Skip comments and empty lines + if not stripped or stripped.startswith('//'): + continue + + sig = parse_go_def_signature(line) + if sig: + if is_type: + # Check for type definition + if sig.kind not in ('func', 'method'): + return i + else: + # Check for function/method definition + if sig.kind in ('func', 'method'): + return i + + return None + + +def is_example_definition( + code: str, + start_line: int, + lines: List[str], + heading: Optional[str], + is_type: bool +) -> bool: + """ + Check if a definition (type or function) is example code. + + Uses the existing find_definition_line_index to find the signature, + then checks if it's example code using the unified is_example_code utility. + + Args: + code: Go code block content + start_line: Line number where code block starts (1-indexed) + lines: All lines of the file (for context checking) + heading: Optional heading text (for example detection) + is_type: True for type definitions, False for function/method definitions + + Returns: + True if the definition is example code, False otherwise + """ + def_line_idx = find_definition_line_index(code, is_type=is_type) + if def_line_idx is None: + return False + + return is_example_code( + code, start_line, + lines=lines, + heading_text=heading, + check_prose_before_block=True, + check_single_line=def_line_idx # 0-indexed line number within code block + ) + + +def count_go_definitions( + code: str, + filter_example: bool = False, + lines: Optional[List[str]] = None, + start_line: int = 1, + heading_text: Optional[str] = None +) -> Dict[str, int]: + """ + Count Go definitions in code using the unified parser. + + Args: + code: Go code block content + filter_example: If True, skip example code when counting + lines: All lines of the file (required if filter_example=True) + start_line: Line number where code block starts (required if filter_example=True) + heading_text: Optional heading text (used for example detection if filter_example=True) + + Returns: + Dictionary with counts: + - 'func': count of functions + - 'method': count of methods + - 'type': count of all types (struct, interface, alias, etc., excluding function types) + - 'func_type': count of function types (type Name func(...)) + """ + counts: Dict[str, int] = { + 'func': 0, + 'method': 0, + 'type': 0, + 'func_type': 0 + } + + code_lines = code.split('\n') + for line_idx, line in enumerate(code_lines): + # Filter example code if requested + if filter_example and lines is not None: + if is_example_code( + code, start_line, + lines=lines, + heading_text=heading_text, + check_prose_before_block=True, + check_single_line=line_idx + ): + continue + + # Try to parse as definition + sig = parse_go_def_signature(line) + if sig: + if sig.kind == 'func': + counts['func'] += 1 + elif sig.kind == 'method': + counts['method'] += 1 + else: + # All other kinds are types (struct, interface, alias, etc.) + counts['type'] += 1 + else: + # Check for function types (parse_go_def_signature excludes these) + stripped = line.strip() + if stripped and not stripped.startswith('//'): + if _RE_FUNC_TYPE_DEF.match(line): + counts['func_type'] += 1 + + return counts + + +def is_definition_start_line(line: str) -> bool: + """ + Return True if the line starts a type, func, method, or func type definition. + + Used to detect signature-only code blocks: only definition-start lines + and continuation lines (braces, indented body) should be present. + + Args: + line: Single line of Go code (may have leading/trailing whitespace). + + Returns: + True if the line is a definition start (type, func, method, func type). + """ + stripped = line.strip() + if not stripped: + return False + if parse_go_def_signature(stripped): + return True + if _RE_FUNC_TYPE_DEF.match(stripped): + return True + return False + + +def is_continuation_line(line: str) -> bool: + """ + Return True if the line is brace-only or indented (part of a definition body). + + Used together with is_definition_start_line to classify lines in a code block + when detecting signature-only blocks (struct/interface bodies, func bodies). + + Args: + line: Single line of Go code (may have leading/trailing whitespace). + + Returns: + True if the line is only braces or starts with whitespace (continuation). + """ + stripped = line.strip() + if stripped in ('{', '}'): + return True + if len(line) > 0 and line[0].isspace(): + return True + return False + + +def is_signature_only_code_block(code: str) -> bool: + """ + Return True if the Go code block contains only definition signatures and bodies. + + After removing comments, every non-empty line must be either a definition-start + line (type/func/method/func type) or a continuation line (brace-only or + indented body). Comment lines are removed and not counted. + + Used by documentation audits to skip requirement coverage for sections that + only list API signatures (e.g. struct definitions with fields, method stubs). + + Args: + code: Go code block content (no markdown fences). + + Returns: + True if the block has at least one definition and no other substantive lines. + """ + cleaned = remove_go_comments(code, multiline=True) + non_empty_lines = [line for line in cleaned.split('\n') if line.strip()] + if not non_empty_lines: + return False + definition_start_count = 0 + for line in non_empty_lines: + if is_definition_start_line(line): + definition_start_count += 1 + elif not is_continuation_line(line): + return False + return definition_start_count >= 1 + + +def find_first_definition(code: str, is_type: bool) -> Optional[Signature]: + """ + Find and parse the first definition in code. + + This is a convenience function that combines finding the definition line + and parsing it into a Signature object. + + Args: + code: Go code block content + is_type: True for type definitions, False for function/method definitions + + Returns: + Signature object or None if no definition found + """ + def_line_idx = find_definition_line_index(code, is_type=is_type) + if def_line_idx is None: + return None + code_lines = code.split('\n') + return parse_go_def_signature(code_lines[def_line_idx]) + + +def check_kind_word_after( + heading: str, + search_term: str, + kind_word: str, + *, + display_name: str, + error_prefix: str, + errors: List[str], +) -> None: + """ + Check if kind word appears immediately after the search term in heading. + + Args: + heading: The heading text (will be converted to lowercase internally) + search_term: The search term to look for (e.g., "Package" or "FileEntry.GetState") + kind_word: The expected kind word (e.g., "Method", "Function", "Struct") + display_name: The name to display in error messages + error_prefix: Prefix for error messages (e.g., "Method heading") + errors: List to append error messages to + + Returns: + None (modifies errors list in place) + """ + heading_lower = heading.lower() + search_term_lower = search_term.lower() + kind_word_lower = kind_word.lower() + + if search_term_lower in heading_lower: + # Find position of search term in heading (case-insensitive) + search_pos = heading_lower.find(search_term_lower) + if search_pos != -1: + # Check if kind word appears immediately after (allow optional ` and whitespace) + after = heading_lower[search_pos + len(search_term_lower):].strip() + after_search = after.lstrip('`').strip() + if not after_search.startswith(kind_word_lower): + errors.append( + f'{error_prefix} should include "{kind_word}" immediately after ' + f'{display_name}' + ) + else: + # If search term is present but we couldn't find it case-insensitively, + # check if kind word is anywhere in heading + if kind_word_lower not in heading_lower: + errors.append( + f'{error_prefix} should include "{kind_word}" immediately after ' + f'{display_name}' + ) diff --git a/scripts/lib/heading_numbering/__init__.py b/scripts/lib/heading_numbering/__init__.py new file mode 100644 index 00000000..1009291c --- /dev/null +++ b/scripts/lib/heading_numbering/__init__.py @@ -0,0 +1,23 @@ +""" +Heading numbering validation helpers. + +Public surface for C0302 module splitting (validate_heading_numbering). +""" + +from lib.heading_numbering._checks import ( + check_duplicate_headings, + check_excessive_numbering, + check_h2_period_consistency, + check_heading_capitalization, + check_organizational_headings, + check_single_word_headings, +) + +__all__ = [ + "check_duplicate_headings", + "check_excessive_numbering", + "check_h2_period_consistency", + "check_heading_capitalization", + "check_organizational_headings", + "check_single_word_headings", +] diff --git a/scripts/lib/heading_numbering/_checks.py b/scripts/lib/heading_numbering/_checks.py new file mode 100644 index 00000000..f351ad85 --- /dev/null +++ b/scripts/lib/heading_numbering/_checks.py @@ -0,0 +1,283 @@ +""" +Heading numbering check functions. + +Extracted from validate_heading_numbering.py for C0302 module splitting. +Callers pass mutable issues list and optional first_error_line dict; these +functions append issues and update first_error_line where relevant. +""" + +from collections import defaultdict +from pathlib import Path +from typing import Callable, List, Optional + +from lib._validation_utils import ( + ValidationIssue, + build_heading_hierarchy, + is_organizational_heading, +) +from lib._validate_heading_numbering_helpers import is_go_code_related_heading +from lib._validate_heading_numbering_models import ( + MAX_HEADING_NUMBER_SEGMENT, + MAX_ORGANIZATIONAL_PROSE_LINES, +) +from lib._validate_heading_numbering_title_case import to_title_case + + +def check_excessive_numbering( + issues: List, + filepath: str, + headings: List, +) -> None: + """ + Check for H3+ headings where the depth-specific number exceeds 20. + Performed after corrected heading numbering is calculated. + """ + if not headings: + return + h3_plus_headings = [h for h in headings if h.level >= 3] + if not h3_plus_headings: + return + for heading in h3_plus_headings: + if not heading.corrected_number: + continue + try: + number_segments = [int(n) for n in heading.corrected_number.split('.')] + except (ValueError, AttributeError): + continue + segment_index = heading.level - 2 + if segment_index >= len(number_segments): + continue + segment = number_segments[segment_index] + if segment > MAX_HEADING_NUMBER_SEGMENT: + msg = ( + f"H{heading.level} heading has numbering " + f"'{heading.corrected_number}' where number {segment} " + f"(at depth {heading.level - 1}) exceeds " + f"{MAX_HEADING_NUMBER_SEGMENT}. " + "Consider restructuring the document to reduce nesting depth." + ) + warning = ValidationIssue.create( + "heading_excessive_numbering", + Path(filepath), + heading.line_num, + heading.line_num, + message=msg, + severity='warning', + heading=heading.full_line, + heading_info=heading + ) + issues.append(warning) + + +def check_single_word_headings( + issues: List, + filepath: str, + headings: List, +) -> None: + """ + Check for H4+ headings where the title (after numbering) is a single word. + """ + if not headings: + return + h4_plus_headings = [h for h in headings if h.level >= 4] + if not h4_plus_headings: + return + for heading in h4_plus_headings: + if not heading.heading_text: + continue + title = heading.heading_text.strip() + if title and ' ' not in title: + msg = (f"H{heading.level} heading has a single-word title '{title}'. " + "Consider using a more descriptive multi-word heading.") + warning = ValidationIssue.create( + "heading_single_word", + Path(filepath), + heading.line_num, + heading.line_num, + message=msg, + severity='warning', + heading=heading.full_line, + heading_info=heading + ) + issues.append(warning) + + +def check_duplicate_headings( + issues: List, + first_error_line: dict, + filepath: str, + headings: List, +) -> None: + """ + Check for duplicate headings (excluding numbering) across all levels. + All occurrences after the first are flagged as errors. + """ + if not headings: + return + heading_groups = defaultdict(list) + for heading in headings: + if not heading.heading_text: + continue + normalized_title = heading.heading_text.strip().lower() + if normalized_title: + heading_groups[normalized_title].append(heading) + for normalized_title, heading_list in heading_groups.items(): + if len(heading_list) > 1: + heading_list.sort(key=lambda h: h.line_num) + for duplicate_heading in heading_list[1:]: + other_locations = [ + f"line {h.line_num}" for h in heading_list + if h.line_num != duplicate_heading.line_num + ] + other_locations_str = ", ".join(other_locations) + msg = (f"Duplicate heading title '{duplicate_heading.heading_text}' " + f"(also appears at {other_locations_str}). " + "Each heading should have a unique title.") + error = ValidationIssue.create( + "heading_duplicate", + Path(filepath), + duplicate_heading.line_num, + duplicate_heading.line_num, + message=msg, + severity='error', + heading=duplicate_heading.full_line, + heading_info=duplicate_heading + ) + issues.append(error) + if duplicate_heading.issue is None: + duplicate_heading.issue = error + if first_error_line.get(filepath) is None: + first_error_line[filepath] = duplicate_heading.line_num + + +def check_heading_capitalization( + issues: List, + filepath: str, + headings: List, + *, + is_go_related: Optional[Callable[[str], bool]] = None, + to_title: Optional[Callable[[str], str]] = None, +) -> None: + """ + Check if headings follow Title Case. + Skips headings that reference Go code elements (use actual identifiers). + """ + if not headings: + return + if is_go_related is None: + is_go_related = is_go_code_related_heading + if to_title is None: + to_title = to_title_case + for heading in headings: + if not heading.heading_text: + continue + if is_go_related(heading.heading_text): + continue + corrected = to_title(heading.heading_text) + if heading.heading_text != corrected: + heading.corrected_capitalization = corrected + msg = "Capitalization may not follow title case." + warning = ValidationIssue.create( + "heading_capitalization", + Path(filepath), + heading.line_num, + heading.line_num, + message=msg, + severity='warning', + heading=heading.full_line, + heading_info=heading + ) + issues.append(warning) + + +def check_organizational_headings( + issues: List, + _first_error_line: dict, + filepath: str, + headings: List, + content: str, + *, + log_fn: Optional[Callable[[str], None]] = None, +) -> None: + """ + Check for organizational headings with no content. + Warnings for headings that are purely organizational with no content. + """ + if not headings: + return + headings_for_hierarchy = [ + (h.line_num, h.level, h.heading_text) + for h in headings + ] + headings_for_hierarchy.sort(key=lambda x: x[0]) + hierarchy = build_heading_hierarchy(headings_for_hierarchy) + for heading in headings: + if heading.issue: + continue + try: + result = is_organizational_heading( + content, + heading.line_num, + heading.level, + headings_for_hierarchy, + hierarchy, + max_prose_lines=MAX_ORGANIZATIONAL_PROSE_LINES + ) + if result.get('is_organizational') and result.get('is_empty'): + msg = ("Organizational heading with no content. " + "Headings should have substantive content or be removed.") + warning = ValidationIssue.create( + "organizational_heading", + Path(filepath), + heading.line_num, + heading.line_num, + message=msg, + severity='warning', + heading=heading.full_line, + heading_info=heading + ) + issues.append(warning) + except (ValueError, IndexError, KeyError) as e: + if log_fn: + log_fn(f" Error checking organizational heading at line {heading.line_num}: {e}") + except (TypeError, AttributeError, RuntimeError) as e: + if log_fn: + log_fn( + f" Unexpected error checking organizational heading at line " + f"{heading.line_num}: {e}" + ) + + +def check_h2_period_consistency( + issues: List, + filepath: str, + headings: List, +) -> None: + """ + Check if H2 headings have consistent period usage. + If first H2 has period, all should have period; otherwise none should. + """ + h2_headings = [h for h in headings if h.level == 2] + if not h2_headings: + return + h2_headings.sort(key=lambda h: h.line_num) + first_h2 = h2_headings[0] + expected_has_period = first_h2.has_period + for heading in h2_headings[1:]: + if heading.has_period != expected_has_period: + expected_str = "with period" if expected_has_period else "without period" + actual_str = "with period" if heading.has_period else "without period" + msg = (f"H2 heading period inconsistency: first H2 is {expected_str}, " + f"but this heading is {actual_str}. " + f"All H2 headings should match the first one.") + warning = ValidationIssue.create( + "heading_period_inconsistency", + Path(filepath), + heading.line_num, + heading.line_num, + message=msg, + severity='warning', + heading=heading.full_line, + heading_info=heading + ) + issues.append(warning) diff --git a/scripts/lib/validation/__init__.py b/scripts/lib/validation/__init__.py new file mode 100644 index 00000000..d88636da --- /dev/null +++ b/scripts/lib/validation/__init__.py @@ -0,0 +1,115 @@ +""" +Validation utilities package. + +Re-exports from _core, _output, _fs, _markdown for backward compatibility. +""" + +from lib.validation._core import ( + DOCS_DIR, + FEATURES_DIR, + REQUIREMENTS_DIR, + TECH_SPECS_DIR, + get_validation_exit_code, + get_workspace_root, + import_module_with_fallback, + parse_paths, +) +from lib.validation._output import ( + COLOR_GREEN, + COLOR_RED, + COLOR_RESET, + COLOR_YELLOW, + SEPARATOR_WIDTH, + OutputBuilder, + ValidationIssue, + calculate_label_width, + colorize, + format_issue_message, + format_summary_line, + parse_no_color_flag, + supports_color, +) +from lib.validation._fs import ( + FileContentCache, + find_feature_files, + find_markdown_files, + is_in_dot_directory, +) +from lib.validation._markdown import ( + HeadingContext, + ProseSection, + build_heading_hierarchy, + contains_url, + count_sentences, + extract_headings, + extract_headings_from_file, + extract_headings_with_anchors, + extract_h2_plus_headings_with_sections, + extract_headings_with_section_numbers, + find_heading_before_line, + find_heading_for_code_block, + generate_anchor_from_heading, + get_backticks_error_message, + get_common_abbreviations, + get_subheadings, + has_backticks, + has_code_blocks, + is_organizational_heading, + is_safe_path, + remove_backticks_keep_content, + validate_anchor, + validate_file_name, + validate_spec_file_name, +) + +__all__ = [ + "DOCS_DIR", + "FEATURES_DIR", + "REQUIREMENTS_DIR", + "TECH_SPECS_DIR", + "COLOR_GREEN", + "COLOR_RED", + "COLOR_RESET", + "COLOR_YELLOW", + "SEPARATOR_WIDTH", + "OutputBuilder", + "ValidationIssue", + "calculate_label_width", + "colorize", + "format_issue_message", + "format_summary_line", + "parse_no_color_flag", + "supports_color", + "FileContentCache", + "find_feature_files", + "find_markdown_files", + "is_in_dot_directory", + "HeadingContext", + "ProseSection", + "build_heading_hierarchy", + "contains_url", + "count_sentences", + "extract_headings", + "extract_headings_from_file", + "extract_headings_with_anchors", + "extract_h2_plus_headings_with_sections", + "extract_headings_with_section_numbers", + "find_heading_before_line", + "find_heading_for_code_block", + "generate_anchor_from_heading", + "get_backticks_error_message", + "get_common_abbreviations", + "get_subheadings", + "has_backticks", + "has_code_blocks", + "is_organizational_heading", + "is_safe_path", + "remove_backticks_keep_content", + "validate_anchor", + "validate_file_name", + "validate_spec_file_name", + "get_validation_exit_code", + "get_workspace_root", + "import_module_with_fallback", + "parse_paths", +] diff --git a/scripts/lib/validation/_core.py b/scripts/lib/validation/_core.py new file mode 100644 index 00000000..2ff19451 --- /dev/null +++ b/scripts/lib/validation/_core.py @@ -0,0 +1,71 @@ +""" +Core constants and path/workspace helpers for validation scripts. +""" + +import importlib +import types +from pathlib import Path +from typing import Optional, List + +# Standard directory names used across validation scripts +DOCS_DIR = 'docs' +TECH_SPECS_DIR = 'tech_specs' +REQUIREMENTS_DIR = 'requirements' +FEATURES_DIR = 'features' + + +def get_validation_exit_code(has_errors, no_fail=False): + """ + Get the appropriate exit code for validation scripts. + + Args: + has_errors: True if validation errors were found, False otherwise + no_fail: If True, always return 0 (even if errors were found) + + Returns: + 0 if no errors found or no_fail is True, 1 if errors were found + """ + if no_fail: + return 0 + return 0 if not has_errors else 1 + + +def get_workspace_root() -> Path: + """ + Get the workspace root directory (parent of scripts directory). + + Returns: + Path to workspace root + """ + # From scripts/lib/validation/_core.py: validation -> lib -> scripts -> repo + script_dir = Path(__file__).parent + return script_dir.parent.parent.parent + + +def import_module_with_fallback(module_name: str, _script_dir: Path) -> types.ModuleType: + """ + Import a module by name. + + Args: + module_name: Name of module to import (e.g., '_validation_utils') + _script_dir: Directory containing the module file (unused) + + Returns: + Imported module + """ + return importlib.import_module(module_name) + + +def parse_paths(path_str: Optional[str]) -> Optional[List[str]]: + """ + Parse comma-separated path string into list of paths. + + Args: + path_str: Comma-separated string of paths, or None + + Returns: + List of trimmed path strings, or None if path_str is None/empty + """ + if not path_str: + return None + return [p.strip() for p in path_str.split(',') if p.strip()] diff --git a/scripts/lib/validation/_fs.py b/scripts/lib/validation/_fs.py new file mode 100644 index 00000000..c7fcc60d --- /dev/null +++ b/scripts/lib/validation/_fs.py @@ -0,0 +1,291 @@ +"""File and path discovery for validation scripts.""" + +import sys +from pathlib import Path +from typing import Dict, List, Optional, Set + + +_DEFAULT_EXCLUDE_DIRS: Set[str] = { + 'node_modules', 'vendor', 'tmp', '.git', '.venv', 'venv', + '__pycache__', '.pytest_cache', 'dist', 'build', + '.idea', '.vscode', '.cache' +} + + +def is_in_dot_directory(path: Path) -> bool: + """ + Check if a path contains any directory starting with '.'. + + Args: + path: Path object to check + + Returns: + True if path contains any directory starting with '.' (except '.' itself), False otherwise + """ + for part in path.parts: + if part.startswith('.') and part != '.': + return True + return False + + +def _resolve_search_dir( + root_dir: Optional[Path], + default_dir: Optional[Path], +) -> Path: + """Resolve search directory from root_dir and default_dir.""" + if root_dir is not None: + return root_dir + if default_dir is not None: + return default_dir + return Path('.') + + +def _collect_md_from_target_paths( + target_paths: List[str], + md_files: List[Path], + verbose: bool, +) -> None: + """Append markdown files from target paths to md_files.""" + for target_path in target_paths: + target = Path(target_path) + if not target.exists(): + if verbose: + print(f"Warning: Target path does not exist: {target_path}", file=sys.stderr) + continue + if target.is_file(): + if target.suffix == '.md' and not is_in_dot_directory(target): + md_files.append(target) + elif verbose: + print( + f"Warning: Target file is not a markdown file: {target_path}", + file=sys.stderr + ) + else: + for md_file in target.rglob('*.md'): + if not is_in_dot_directory(md_file): + md_files.append(md_file) + + +def _collect_md_from_search_dir( + search_dir: Path, + exclude_dirs: Set[str], + default_dir: Optional[Path], + *, + root_dir: Optional[Path], + md_files: List[Path], + verbose: bool, +) -> None: + """Append markdown files from search_dir to md_files.""" + if not search_dir.exists(): + if verbose: + print(f"Error: Search directory does not exist: {search_dir}", file=sys.stderr) + return + if default_dir is not None and root_dir is None: + md_files.extend( + f for f in sorted(search_dir.glob('*.md')) + if not is_in_dot_directory(f) + ) + else: + for md_file in search_dir.rglob('*.md'): + if any(excluded in md_file.parts for excluded in exclude_dirs): + continue + if is_in_dot_directory(md_file): + continue + md_files.append(md_file) + + +def find_markdown_files( + target_paths: Optional[List[str]] = None, + root_dir: Optional[Path] = None, + default_dir: Optional[Path] = None, + *, + exclude_dirs: Optional[Set[str]] = None, + verbose: bool = False, + return_strings: bool = False, +) -> List[Path]: + """ + Find markdown files in the repository or target paths. + + Args: + target_paths: Optional list of specific files or directories to check + root_dir: Root directory to search from (when target_paths is None) + default_dir: Default directory to search if target_paths is None and root_dir is None + exclude_dirs: Set of directory names to exclude when scanning root_dir + verbose: Whether to show detailed progress + return_strings: If True, return list of strings instead of Path objects + + Returns: + List of Path objects (or strings if return_strings=True) for markdown files found + """ + md_files: List[Path] = [] + exclude = exclude_dirs if exclude_dirs is not None else _DEFAULT_EXCLUDE_DIRS + if target_paths: + _collect_md_from_target_paths(target_paths, md_files, verbose) + else: + search_dir = _resolve_search_dir(root_dir, default_dir) + _collect_md_from_search_dir( + search_dir, exclude, default_dir, + root_dir=root_dir, md_files=md_files, verbose=verbose, + ) + if return_strings: + return sorted([str(f) for f in md_files]) + return sorted(md_files) + + +def _collect_feature_from_target_paths( + target_paths: List[str], + feature_files: List[Path], + verbose: bool, +) -> None: + """Append feature files from target paths to feature_files.""" + for target_path in target_paths: + target = Path(target_path) + if not target.exists(): + if verbose: + print(f"Warning: Target path does not exist: {target_path}", file=sys.stderr) + continue + if target.is_file(): + if target.suffix == '.feature' and not is_in_dot_directory(target): + feature_files.append(target) + elif verbose: + print( + f"Warning: Target file is not a .feature file: {target_path}", + file=sys.stderr + ) + else: + for feature_file in target.rglob('*.feature'): + if not is_in_dot_directory(feature_file): + feature_files.append(feature_file) + + +def _collect_feature_from_search_dir( + search_dir: Path, + exclude_dirs: Set[str], + default_dir: Optional[Path], + *, + root_dir: Optional[Path], + feature_files: List[Path], + verbose: bool, +) -> None: + """Append feature files from search_dir to feature_files.""" + if not search_dir.exists(): + if verbose: + print(f"Error: Search directory does not exist: {search_dir}", file=sys.stderr) + return + if default_dir is not None and root_dir is None: + feature_files.extend( + f for f in sorted(search_dir.rglob('*.feature')) + if not is_in_dot_directory(f) + and not any(excluded in f.parts for excluded in exclude_dirs) + ) + else: + for feature_file in search_dir.rglob('*.feature'): + if any(excluded in feature_file.parts for excluded in exclude_dirs): + continue + if is_in_dot_directory(feature_file): + continue + feature_files.append(feature_file) + + +def find_feature_files( + target_paths: Optional[List[str]] = None, + root_dir: Optional[Path] = None, + default_dir: Optional[Path] = None, + *, + exclude_dirs: Optional[Set[str]] = None, + verbose: bool = False, + return_strings: bool = False, +) -> List[Path]: + """ + Find feature files (.feature) in the repository or target paths. + + Args: + target_paths: Optional list of specific files or directories to check + root_dir: Root directory to search from (when target_paths is None) + default_dir: Default directory to search if target_paths is None and root_dir is None + exclude_dirs: Set of directory names to exclude when scanning root_dir + verbose: Whether to show detailed progress + return_strings: If True, return list of strings instead of Path objects + + Returns: + List of Path objects (or strings if return_strings=True) for feature files found + """ + feature_files: List[Path] = [] + exclude = exclude_dirs if exclude_dirs is not None else _DEFAULT_EXCLUDE_DIRS + if target_paths: + _collect_feature_from_target_paths(target_paths, feature_files, verbose) + else: + search_dir = _resolve_search_dir(root_dir, default_dir) + _collect_feature_from_search_dir( + search_dir, exclude, default_dir, + root_dir=root_dir, feature_files=feature_files, verbose=verbose, + ) + if return_strings: + return sorted([str(f) for f in feature_files]) + return sorted(feature_files) + + +class FileContentCache: + """ + Cache for file contents to avoid repeated reads. + + This class provides efficient caching of file contents to reduce I/O overhead + when the same files are read multiple times during validation. + """ + + def __init__(self): + """Initialize an empty cache.""" + self._cache: Dict[Path, str] = {} + self._lines_cache: Dict[Path, List[str]] = {} + + def get_content(self, file_path: Path) -> str: + """ + Get file content, using cache if available. + + Args: + file_path: Path to the file to read + + Returns: + File content as string + + Raises: + IOError: If file cannot be read + """ + if file_path not in self._cache: + self._cache[file_path] = file_path.read_text(encoding='utf-8') + return self._cache[file_path] + + def get_lines(self, file_path: Path) -> List[str]: + """ + Get file content as list of lines, using cache if available. + + Args: + file_path: Path to the file to read + + Returns: + File content as list of lines (without newline characters) + + Raises: + IOError: If file cannot be read + """ + if file_path not in self._lines_cache: + content = self.get_content(file_path) + self._lines_cache[file_path] = content.split('\n') + return self._lines_cache[file_path] + + def clear(self): + """Clear all cached content.""" + self._cache.clear() + self._lines_cache.clear() + + def has(self, file_path: Path) -> bool: + """ + Check if file content is cached. + + Args: + file_path: Path to check + + Returns: + True if file is cached, False otherwise + """ + return file_path in self._cache diff --git a/scripts/lib/validation/_markdown.py b/scripts/lib/validation/_markdown.py new file mode 100644 index 00000000..c7aee949 --- /dev/null +++ b/scripts/lib/validation/_markdown.py @@ -0,0 +1,970 @@ +"""Markdown and heading utilities for validation scripts.""" + +import re +import sys +from pathlib import Path +from typing import Optional, List, Set, Tuple, Dict +from dataclasses import dataclass + +from lib.validation._fs import FileContentCache + +# Compiled regex patterns for performance (module level) +_RE_HEADING_PATTERN = re.compile(r'^(#{1,6})\s+(.+)$') +_RE_DECIMAL_PATTERN = re.compile(r'\d+\.\d+') +_RE_SENTENCE_END_PATTERN = re.compile(r'[.!?]+(?=\s+|$)') +_RE_HEADING_NUM_PATTERN = re.compile(r'^\d+(?:\.\d+)*$') + + +@dataclass(frozen=True) +class HeadingContext: + """ + Context information about a markdown heading. + + Used to track heading information for code blocks and signatures. + """ + heading_text: str # The heading text (without # markers) + heading_level: int # Heading depth (1-6, where 1 is most general) + heading_line: int # Line number of the heading (1-indexed) + file_path: Optional[str] = None # Optional file path for context + + +@dataclass +class ProseSection: + """ + Represents a prose-only section in a markdown document. + + This supports Overview blocks and prose subsections in index-style documents. + """ + + heading_str: str + heading_num: Optional[str] + heading_level: int + heading_line: Optional[int] + content: str + parent_section: Optional["ProseSection"] = None + child_sections: List["ProseSection"] = None + has_code: bool = False + code_blocks: List[Tuple[int, int, str]] = None + file_path: Optional[str] = None + lines: Optional[Tuple[int, int]] = None + + def __post_init__(self) -> None: + if self.child_sections is None: + self.child_sections = [] + if self.code_blocks is None: + self.code_blocks = [] + if self.heading_num is not None: + if not isinstance(self.heading_num, str): + raise ValueError("heading_num must be a string or None") + if not _RE_HEADING_NUM_PATTERN.match(self.heading_num): + raise ValueError( + f"heading_num must be a dotted number like '1', '2.4', or '3.5.6', " + f"got: {self.heading_num!r}" + ) + + def path_label(self) -> str: + parts: List[str] = [] + cur: Optional["ProseSection"] = self + while cur is not None: + parts.append(cur.heading_str) + cur = cur.parent_section + parts.reverse() + return " > ".join(parts) + + +def extract_headings(content: str, skip_code_blocks: bool = True) -> List[Tuple[str, int, int]]: + """ + Extract all headings from markdown content. + + Args: + content: Markdown content as string + skip_code_blocks: If True, skip headings inside code blocks + + Returns: + List of tuples: (heading_text, heading_level, line_number) + Lines are 1-indexed. + """ + headings: List[Tuple[str, int, int]] = [] + lines = content.split('\n') + in_code_block = False + + for i, line in enumerate(lines, 1): + stripped_line = line.strip() + + if skip_code_blocks: + # Check for code block boundaries + if stripped_line.startswith('```'): + in_code_block = not in_code_block + continue + + # Skip lines inside code blocks + if in_code_block: + continue + + # Match markdown headings (# through ######) + match = _RE_HEADING_PATTERN.match(stripped_line) + if match: + heading_level = len(match.group(1)) + heading_text = match.group(2).strip() + headings.append((heading_text, heading_level, i)) + + return headings + + +def extract_headings_from_file( + file_path: Path, skip_code_blocks: bool = True, file_cache: Optional['FileContentCache'] = None +) -> List[Tuple[str, int, int]]: + """ + Extract all headings from a markdown file. + + Args: + file_path: Path to the markdown file + skip_code_blocks: If True, skip headings inside code blocks + file_cache: Optional FileContentCache instance to use for reading files + + Returns: + List of tuples: (heading_text, heading_level, line_number) + Lines are 1-indexed. + """ + try: + if file_cache: + content = file_cache.get_content(file_path) + else: + with open(file_path, 'r', encoding='utf-8') as f: + content = f.read() + return extract_headings(content, skip_code_blocks=skip_code_blocks) + except (OSError, UnicodeDecodeError, ValueError) as e: + print(f"Error reading {file_path}: {e}", file=sys.stderr) + return [] + + +def extract_headings_with_anchors( + file_path: Path, min_level: int = 1, max_level: int = 6, + skip_code_blocks: bool = True, file_cache: Optional['FileContentCache'] = None +) -> Dict[str, Tuple[str, int, int]]: + """ + Extract all headings from a markdown file and generate anchors. + + Args: + file_path: Path to the markdown file + min_level: Minimum heading level to include (1-6, default: 1) + max_level: Maximum heading level to include (1-6, default: 6) + skip_code_blocks: If True, skip headings inside code blocks + file_cache: Optional FileContentCache instance to use for reading files + + Returns: + Dictionary mapping anchor -> (heading_text, heading_level, line_number) + """ + headings_dict = {} + headings = extract_headings_from_file( + file_path, skip_code_blocks=skip_code_blocks, file_cache=file_cache + ) + for heading_text, heading_level, line_num in headings: + if min_level <= heading_level <= max_level: + anchor = generate_anchor_from_heading(heading_text, include_hash=False) + headings_dict[anchor] = (heading_text, heading_level, line_num) + return headings_dict + + +def extract_h2_plus_headings_with_sections( + file_path: Path, skip_code_blocks: bool = True, + file_cache: Optional['FileContentCache'] = None +) -> List[Tuple[int, str, int, str, Optional[str]]]: + """ + Extract H2+ headings (## through ######) with anchors and section numbers. + + Args: + file_path: Path to the markdown file + skip_code_blocks: If True, skip headings inside code blocks + file_cache: Optional FileContentCache instance to use for reading files + + Returns: + List of tuples: (heading_level, heading_text, line_num, anchor, section_anchor) + where: + - heading_level is 2 for ##, 3 for ###, etc. + - anchor is the plain anchor from heading text + - section_anchor is the anchor with section number prefix (if section number exists) + """ + headings_list = [] + headings = extract_headings_from_file( + file_path, skip_code_blocks=skip_code_blocks, file_cache=file_cache + ) + for heading_text, heading_level, line_num in headings: + # Only include H2+ headings (level 2-6) + if heading_level < 2: + continue + + # Extract section number if present (e.g., "1.2.3 Heading" -> "1.2.3") + section_match = re.match(r'^(\d+(?:\.\d+)*)\s+(.+)$', heading_text) + section_anchor = None + + if section_match: + # Heading has section number: "1.2.3 Heading Text" + section_num = section_match.group(1) + section_num_no_dots = section_num.replace('.', '') + heading_text_without_section = section_match.group(2).strip() + # Generate anchor from heading text without section number + anchor = generate_anchor_from_heading(heading_text_without_section, include_hash=False) + # Section anchor: section_num-anchor (e.g., "123-heading-text") + section_anchor = f"{section_num_no_dots}-{anchor}" + else: + # Heading has no section number: just generate anchor from text + anchor = generate_anchor_from_heading(heading_text, include_hash=False) + + headings_list.append((heading_level, heading_text, line_num, anchor, section_anchor)) + return headings_list + + +def extract_headings_with_section_numbers( + file_path: Path, min_level: int = 2, max_level: int = 6, + skip_code_blocks: bool = True, file_cache: Optional['FileContentCache'] = None +) -> Tuple[Set[str], Dict[str, Tuple[str, str]]]: + """ + Parse markdown file to extract all heading anchors and section numbers. + + Args: + file_path: Path to the markdown file + min_level: Minimum heading level to include (1-6, default: 2 for H2+) + max_level: Maximum heading level to include (1-6, default: 6) + skip_code_blocks: If True, skip headings inside code blocks + file_cache: Optional FileContentCache instance to use for reading files + + Returns: + Tuple of (anchors set, sections dict where key is section_num and + value is (heading_text, anchor)) + """ + anchors = set() + sections = {} # section_num -> (heading_text, anchor) + + if not file_path.exists(): + return anchors, sections + + headings = extract_headings_from_file( + file_path, skip_code_blocks=skip_code_blocks, file_cache=file_cache + ) + for heading_text, heading_level, _line_num in headings: + if min_level <= heading_level <= max_level: + # Generate anchor from heading text (without '#' prefix) + anchor = generate_anchor_from_heading(heading_text, include_hash=False) + anchors.add(anchor) + + # Extract section number if present (e.g., "2.1 AddFile Package Method" -> "2.1") + section_match = re.match(r'^(\d+(?:\.\d+)*)', heading_text) + if section_match: + section_num = section_match.group(1) + sections[section_num] = (heading_text, anchor) + + return anchors, sections + + +def find_heading_before_line( + content: str, line_num: int, prefer_deepest: bool = True +) -> Optional[HeadingContext]: + """ + Find the heading context for a given line number in markdown content. + + Args: + content: Markdown content as string + line_num: Target line number (1-indexed) + prefer_deepest: If True, return the most specific (deepest) heading. + If False, return the most recent heading. + + Returns: + HeadingContext if a heading is found before the line, None otherwise. + """ + lines = content.split('\n') + + if line_num < 1 or line_num > len(lines): + return None + + # Find the most recent heading before this line + + if prefer_deepest: + # Track heading stack to find the most specific heading + heading_stack = [] # List of (level, text, line_num) tuples + + for i, line in enumerate(lines[:line_num], 1): + match = _RE_HEADING_PATTERN.match(line.strip()) + if match: + level = len(match.group(1)) + text = match.group(2).strip() + # Remove headings at same or deeper level from stack + heading_stack = [h for h in heading_stack if h[0] < level] + # Add this heading + heading_stack.append((level, text, i)) + + # Get the most specific (deepest) heading + if heading_stack: + last_heading_level, last_heading, last_heading_line = heading_stack[-1] + return HeadingContext( + heading_text=last_heading, + heading_level=last_heading_level, + heading_line=last_heading_line + ) + else: + # Find the most recent heading (not necessarily deepest) + for i in range(line_num - 1, -1, -1): + if i < len(lines): + match = _RE_HEADING_PATTERN.match(lines[i].strip()) + if match: + level = len(match.group(1)) + text = match.group(2).strip() + return HeadingContext( + heading_text=text, + heading_level=level, + heading_line=i + 1 + ) + + return None + + +def find_heading_for_code_block( + content: str, code_block_start_line: int +) -> Optional[str]: + """ + Find the heading text that appears before a code block. + + This is a simpler version that just returns the heading text, + useful for cases where only the text is needed. + + Args: + content: Markdown content as string + code_block_start_line: Line number where the code block starts (1-indexed) + + Returns: + Heading text if found, None otherwise. + """ + ctx = find_heading_before_line(content, code_block_start_line, prefer_deepest=False) + return ctx.heading_text if ctx else None + + +def get_common_abbreviations() -> Set[str]: + """ + Get comprehensive list of common abbreviations (case-insensitive matching). + + Returns: + Set of abbreviations (all lowercase for case-insensitive matching) + """ + return { + # Titles + 'dr.', 'mr.', 'mrs.', 'ms.', 'prof.', + # Academic degrees + 'ph.d.', 'm.d.', 'b.a.', 'm.a.', 'b.s.', 'm.s.', + # Common abbreviations + 'etc.', 'i.e.', 'e.g.', 'vs.', 'a.m.', 'p.m.', + # Business/location + 'inc.', 'ltd.', 'corp.', 'st.', 'ave.', 'blvd.', + } + + +def contains_url(text: str) -> bool: + """ + Check if text contains a URL. + + Detects: + - http:// and https:// URLs + - www. URLs (with word boundaries) + - mailto: links + + Args: + text: Text to check + + Returns: + True if text contains a URL, False otherwise + """ + url_patterns = [ + r'https?://', # http:// or https:// + r'\bwww\.', # www. with word boundary + r'mailto:', # mailto: links + ] + for pattern in url_patterns: + if re.search(pattern, text, re.IGNORECASE): + return True + return False + + +def _is_ellipsis_at(text: str, punct_pos: int) -> bool: + """Return True if punct_pos is part of an ellipsis.""" + if punct_pos > 0 and punct_pos + 2 < len(text): + if text[punct_pos - 1:punct_pos + 2] == '...' or text[punct_pos:punct_pos + 3] == '...': + return True + return punct_pos + 1 < len(text) and text[punct_pos:punct_pos + 2] == '..' + + +def _is_decimal_near(text: str, punct_pos: int) -> bool: + """Return True if punct_pos is inside a decimal number.""" + context = text[max(0, punct_pos - 10):min(len(text), punct_pos + 10)] + return bool(_RE_DECIMAL_PATTERN.search(context)) + + +def _is_url_near(text: str, punct_pos: int, has_urls: bool) -> bool: + """Return True if punct_pos is inside a URL.""" + if not has_urls: + return False + url_context = text[ + max(0, punct_pos - 30):min(len(text), punct_pos + 30) + ] + return contains_url(url_context) + + +def _is_abbreviation_at( + text: str, text_lower: str, punct_pos: int, _abbreviations: set +) -> str: + """Return the word before punct_pos (for abbreviation check). Empty if not found.""" + word_start = punct_pos + while word_start > 0 and (text[word_start - 1].isalnum() or text[word_start - 1] == '.'): + word_start -= 1 + return text_lower[word_start:punct_pos + 1] + + +def _should_skip_sentence_boundary( + text: str, + text_lower: str, + punct_pos: int, + abbreviations: set, + has_urls: bool, +) -> bool: + """Return True if this punctuation position is not a real sentence end.""" + if _is_ellipsis_at(text, punct_pos): + return True + if _is_decimal_near(text, punct_pos): + return True + if _is_url_near(text, punct_pos, has_urls): + return True + word_before = _is_abbreviation_at(text, text_lower, punct_pos, abbreviations) + return word_before in abbreviations + + +def _try_append_hybrid_sentence( + text: str, + punct_pos: int, + *, + punct_end: int, + last_end: int, + word_before: str, + abbreviations: set, + sentences: list, +) -> bool: + """If period+uppercase sentence end, append and return True. Else return False.""" + if text[punct_pos] != '.' or punct_end >= len(text): + return False + next_char_pos = punct_end + while next_char_pos < len(text) and text[next_char_pos].isspace(): + next_char_pos += 1 + if (next_char_pos >= len(text) or + not text[next_char_pos].isupper() or + word_before in abbreviations): + return False + sentence = text[last_end:punct_end].strip() + if sentence: + sentences.append(sentence) + return True + + +def count_sentences(text: str) -> int: + """ + Count sentences in text, handling edge cases. + + Splits on sentence-ending punctuation (., !, ?) followed by space/newline. + Handles edge cases: + - Abbreviations: using get_common_abbreviations() (normalize both to lowercase for comparison) + - Decimals: \\d+\\.\\d+ pattern + - URLs: using contains_url function + - Ellipses: ... and Unicode ellipsis (…) + - Hybrid approach: period + uppercase next char AND not in abbreviation list (case-insensitive) + + Args: + text: Text to count sentences in + + Returns: + Number of sentences (0 for empty/whitespace text, minimum 1 if text is non-empty) + """ + if not text or not text.strip(): + return 0 + + abbreviations = get_common_abbreviations() + text_lower = text.lower() + has_urls = contains_url(text) + matches = list(_RE_SENTENCE_END_PATTERN.finditer(text)) + if not matches: + return 1 if text.strip() else 0 + + sentences = [] + last_end = 0 + + for match in matches: + punct_pos = match.start() + punct_end = match.end() + if punct_end < len(text) and text[punct_end].isspace(): + while punct_end < len(text) and text[punct_end].isspace(): + punct_end += 1 + + if _should_skip_sentence_boundary( + text, text_lower, punct_pos, abbreviations, has_urls + ): + continue + + word_before = _is_abbreviation_at(text, text_lower, punct_pos, abbreviations) + if _try_append_hybrid_sentence( + text, punct_pos, + punct_end=punct_end, last_end=last_end, word_before=word_before, + abbreviations=abbreviations, sentences=sentences, + ): + last_end = punct_end + continue + + sentence = text[last_end:punct_end].strip() + if sentence: + sentences.append(sentence) + last_end = punct_end + + if last_end < len(text): + remaining = text[last_end:].strip() + if remaining: + sentences.append(remaining) + sentences = [s for s in sentences if s] + return len(sentences) if sentences else (1 if text.strip() else 0) + + +def has_code_blocks(content: str, exclude_languages: Optional[Set[str]] = None) -> bool: + """ + Check if content contains code blocks (any language, excluding specified). + + Extracts first word from language identifier by splitting on any non-alpha character. + Examples: "go example" -> "go", "rust,no_run" -> "rust", "c++" -> "c" + + Args: + content: Markdown content to check + exclude_languages: Optional set of language identifiers to exclude + (e.g., {'text', 'markdown'}) + + Returns: + True if content contains code blocks (excluding specified languages) + """ + lines = content.split('\n') + in_code_block = False + code_block_language = None + + for line in lines: + stripped = line.strip() + if stripped.startswith('```'): + if in_code_block: + # Closing code block + in_code_block = False + code_block_language = None + else: + # Opening code block + in_code_block = True + # Extract language identifier + language_part = stripped[3:].strip() + if language_part: + # Split on any non-alpha character to get first token + match = re.match(r'^([a-zA-Z]+)', language_part) + if match: + code_block_language = match.group(1).lower() + else: + code_block_language = None + else: + code_block_language = None + + # Check if this language should be excluded + if exclude_languages and code_block_language: + if code_block_language in exclude_languages: + # Skip this code block + continue + + # Found a code block that's not excluded + return True + + return False + + +def build_heading_hierarchy( + headings: List[Tuple[int, int, str]] # (line_num, level, text) +) -> Dict[int, Optional[int]]: + """ + Build parent-child relationship mapping for headings. + + Uses heading_stack approach similar to validate_heading_numbering.py. + Each heading finds its most recent parent at the appropriate level. + + Args: + headings: List of (line_num, level, text) tuples, sorted by line_num + + Returns: + Dict mapping heading index (0-based) -> parent heading index (None if no parent). + If H3+ appears before H2, it has no parent (None). + """ + hierarchy = {} + heading_stack = {} # Maps level -> heading_index (current parent at that level) + + for idx, (_line_num, level, _text) in enumerate(headings): + parent_index = None + if level > 2: + # H3 and beyond need a parent + parent_level = level - 1 + parent_index = heading_stack.get(parent_level) + + hierarchy[idx] = parent_index + + # Update heading stack - set this heading as the current parent at its level + heading_stack[level] = idx + + # Clear deeper levels when we move up in hierarchy + levels_to_clear = [lvl for lvl in heading_stack if lvl > level] + for lvl in levels_to_clear: + del heading_stack[lvl] + + return hierarchy + + +def get_subheadings( + heading_index: int, + heading_level: int, + all_headings: List[Tuple[int, int, str]], + hierarchy: Dict[int, Optional[int]] +) -> List[int]: + """ + Get all subheadings (all descendants at any level > heading_level) for a given heading. + + Args: + heading_index: Index of the heading in all_headings list (0-based) + heading_level: Level of the heading + all_headings: List of (line_num, level, text) tuples + hierarchy: Parent-child mapping from build_heading_hierarchy + + Returns: + List of indices (0-based) for all subheadings (all descendants at any level > heading_level) + """ + subheadings = [] + + # Find all headings that are descendants of this heading + # A heading is a descendant if it has this heading in its ancestor chain + def is_descendant(child_idx: int) -> bool: + current = child_idx + while current is not None and current in hierarchy: + parent = hierarchy[current] + if parent == heading_index: + return True + current = parent + return False + + for idx, (_line_num, level, _text) in enumerate(all_headings): + if idx != heading_index and level > heading_level: + if is_descendant(idx): + subheadings.append(idx) + + return subheadings + + +def is_organizational_heading( + content: str, + heading_line: int, + heading_level: int, + all_headings: List[Tuple[int, int, str]], + hierarchy: Dict[int, Optional[int]], + *, + max_prose_lines: int = 5, +) -> dict: + """ + Determine if a heading is purely organizational (grouping only). + + A heading is organizational if: + - Has no code blocks (any language, except text/markdown) + - Has max_prose_lines or fewer sentences + - Only contains subheadings with no substantive content + + Args: + content: Full markdown content + heading_line: Line number of the heading (1-indexed) + heading_level: Level of the heading (2-6) + all_headings: List of (line_num, level, text) tuples + hierarchy: Parent-child mapping from build_heading_hierarchy + max_prose_lines: Maximum sentences before considered non-organizational + + Returns: + Dict with: + - is_organizational: bool - True if heading is organizational + - is_empty: bool - True if heading has no content (0 sentences), + False if it has minor informative content (1-5 sentences) + - sentence_count: int - Number of sentences in the section + """ + # Find the heading index + heading_index = None + for idx, (line_num, level, _text) in enumerate(all_headings): + if line_num == heading_line and level == heading_level: + heading_index = idx + break + + if heading_index is None: + return {'is_organizational': False, 'is_empty': False, 'sentence_count': 0} + + # Find next heading (any level) to determine section boundaries + next_heading_line = None + for line_num, _level, _text in all_headings: + if line_num > heading_line: + next_heading_line = line_num + break + + # Extract section content + lines = content.split('\n') + if next_heading_line: + section_lines = lines[heading_line - 1:next_heading_line - 1] + else: + section_lines = lines[heading_line - 1:] + + section_content = '\n'.join(section_lines) + + # Extract prose (non-heading, non-code-block lines) + prose_lines = [] + in_code_block = False + for line in section_lines[1:]: # Skip the heading line itself + stripped = line.strip() + if stripped.startswith('```'): + in_code_block = not in_code_block + continue + if in_code_block: + continue + # Check if it's a heading + if re.match(r'^#{1,6}\s+', stripped): + continue + if stripped: + prose_lines.append(line) + + prose_text = '\n'.join(prose_lines) + + # Count sentences + sentence_count = count_sentences(prose_text) + + # Check for code blocks (excluding text/markdown) + if has_code_blocks(section_content, exclude_languages={'text', 'markdown'}): + return {'is_organizational': False, 'is_empty': False, 'sentence_count': sentence_count} + + # Get subheadings + subheadings = get_subheadings(heading_index, heading_level, all_headings, hierarchy) + + # Return organizational if: no code blocks AND sentence_count <= max_prose_lines AND + # (sentence_count == 0 OR only subheadings) + if sentence_count <= max_prose_lines: + if not sentence_count or (subheadings and len(prose_lines) <= max_prose_lines): + return { + 'is_organizational': True, + 'is_empty': (not sentence_count), + 'sentence_count': sentence_count + } + + return {'is_organizational': False, 'is_empty': False, 'sentence_count': sentence_count} + + +def generate_anchor_from_heading(heading: str, include_hash: bool = False) -> str: + """ + Generate a GitHub-style markdown anchor from heading text. + + This function implements GitHub's markdown anchor generation algorithm: + - Removes backticks but preserves their content (e.g., `` `code` `` -> `code`) + - Converts to lowercase + - Removes special characters except word characters, spaces, and hyphens + - Collapses sequences of spaces and hyphens into a single hyphen + - Strips leading and trailing hyphens + + Args: + heading: The heading text (may contain markdown formatting like backticks) + include_hash: If True, prefix the anchor with '#' (default: False) + + Returns: + The generated anchor string (with '#' prefix if include_hash=True) + + Examples: + >>> generate_anchor_from_heading("1.2.3 AddFile Package Method") + '123-addfile-package-method' + >>> generate_anchor_from_heading("File Management with `Package` type") + 'file-management-with-package-type' + >>> generate_anchor_from_heading("Heading - With Multiple Spaces") + 'heading-with-multiple-spaces' + """ + if not heading: + return "" + + # Remove markdown formatting (backticks) but preserve their content + # This matches GitHub's behavior: `` `code` `` becomes `code` in the anchor + heading_clean = re.sub(r'`([^`]+)`', r'\1', heading) + + # Convert to lowercase + heading_lower = heading_clean.lower() + + # Preserve " - " (space-hyphen-space) as "---" to match GitHub/markdownlint MD051 + _placeholder = 'TRPLDASH' + heading_lower = heading_lower.replace(' - ', _placeholder) + + # Remove special characters except word characters, spaces, and hyphens + anchor = re.sub(r'[^\w\s-]', '', heading_lower) + + # Collapse sequences of spaces and hyphens into a single hyphen + anchor = re.sub(r'[-\s]+', '-', anchor) + + # Restore "---" for " - " so slug matches GitHub/markdownlint + anchor = anchor.replace(_placeholder, '---') + + # Strip leading and trailing hyphens + anchor = anchor.strip('-') + + # Add '#' prefix if requested + if include_hash: + return '#' + anchor if anchor else "" + return anchor + + +def remove_backticks_keep_content(text: str) -> str: + """ + Remove backticks from text but keep their contents. + + This removes the backtick characters but preserves the text that was + enclosed in backticks. This is the standard behavior for both validation scripts. + + Args: + text: Text that may contain backticks + + Returns: + Text with backticks removed but content preserved + + Examples: + "Heading with `code` example" => "Heading with code example" + "`func()` and `var`" => "func() and var" + "`code`" => "code" + "No backticks here" => "No backticks here" + """ + if not text: + return text + + # Remove backticks but keep content + # Pattern matches backtick, captures content, matches closing backtick + result = re.sub(r'`([^`]*)`', r'\1', text) + return result + + +def has_backticks(text: str) -> bool: + """ + Check if text contains backticks. + + Args: + text: Text to check for backticks + + Returns: + True if text contains backticks, False otherwise + + Examples: + "Heading with `code`" => True + "Plain text heading" => False + "" => False + None => False + """ + if not text: + return False + return '`' in text + + +def get_backticks_error_message() -> str: + """ + Get the standard error message for backticks in headings. + + Returns: + Standard error message string for backticks in headings + """ + return ("Heading contains backticks. " + "Headings should not contain backticks; use plain text instead.") + + +def is_safe_path(file_path: Path, repo_root: Path) -> bool: + """ + Check if a path is safe (within repo and no traversal). + + Args: + file_path: Path to check + repo_root: Repository root directory + + Returns: + True if path is safe (within repo root), False otherwise + """ + try: + # Resolve to absolute path and check it's within repo + resolved = file_path.resolve() + repo_resolved = repo_root.resolve() + # Check that resolved path is within repo root + return str(resolved).startswith(str(repo_resolved)) + except (OSError, ValueError): + return False + + +def validate_file_name(filename: str) -> bool: + """ + Validate that filename is safe (no path traversal, no separators). + + Args: + filename: Filename to validate + + Returns: + True if filename is safe, False otherwise + """ + if not filename: + return False + # No path separators allowed + if '/' in filename or '\\' in filename: + return False + # No parent directory references + if '..' in filename: + return False + # No null bytes + if '\x00' in filename: + return False + return True + + +def validate_spec_file_name(spec_file: str) -> bool: + """ + Validate that spec file name is safe (no path traversal, no separators, .md extension). + + Args: + spec_file: Spec file name to validate + + Returns: + True if spec file name is safe, False otherwise + """ + if not spec_file: + return False + # Must be a simple filename with .md extension + # No path separators allowed + if '/' in spec_file or '\\' in spec_file: + return False + # No parent directory references + if '..' in spec_file: + return False + # Must end with .md + if not spec_file.endswith('.md'): + return False + # Must be a valid filename (alphanumeric, underscore, hyphen, dot) + if not re.match(r'^[a-zA-Z0-9_\-]+\.md$', spec_file): + return False + return True + + +def validate_anchor(anchor: str) -> bool: + """ + Validate that anchor is safe (no path traversal, no separators). + + Args: + anchor: Anchor string to validate + + Returns: + True if anchor is safe, False otherwise + """ + if not anchor: + return True # Empty anchor is OK + if '/' in anchor or '\\' in anchor: + return False + if '..' in anchor: + return False + if '\x00' in anchor: + return False + if not re.match(r'^[a-zA-Z0-9_\-]+$', anchor): + return False + return True diff --git a/scripts/lib/validation/_output.py b/scripts/lib/validation/_output.py new file mode 100644 index 00000000..ca25df6f --- /dev/null +++ b/scripts/lib/validation/_output.py @@ -0,0 +1,933 @@ +"""Output and issue formatting for validation scripts.""" + +import os +import sys +import re +from pathlib import Path +from typing import Optional, Dict + +COLOR_GREEN = "32" +COLOR_RED = "31" +COLOR_YELLOW = "33" +COLOR_RESET = "0" + +# Standard separator width +SEPARATOR_WIDTH = 80 + + +def _section_has_header(lines, separator: str, title: str, max_look: int = 3) -> bool: + """Return True if lines start with separator followed by title within max_look.""" + for i in range(min(max_look, len(lines))): + if lines[i] == separator and i + 1 < len(lines) and lines[i + 1].strip() == title: + return True + return False + + +def _ensure_section_header(section_lines: list, title: str, separator: str) -> list: + """Prepend separator and title header to section_lines if not already present.""" + if _section_has_header(section_lines, separator, title): + return section_lines + return [separator, title, separator] + section_lines + + +def supports_color(no_color_flag=False): + """ + Check if colors should be used. + + Args: + no_color_flag: If True, disable colors regardless of other conditions + + Returns: + True if colors should be used, False otherwise + """ + if no_color_flag or 'NO_COLOR' in os.environ: + return False + return sys.stdout.isatty() + + +def colorize(text, color_code, no_color_flag=False): + """ + Apply color to text if colors are supported. + + Args: + text: Text to colorize + color_code: ANSI color code (e.g., "32" for green) + no_color_flag: If True, disable colors + + Returns: + Colorized text if colors are supported, otherwise original text + """ + if supports_color(no_color_flag): + return f"\033[{color_code}m{text}\033[0m" + return text + + +def format_summary_line(label, value, label_width=25, value_width=6): + """ + Format a summary line with aligned columns. + + Args: + label: Label text (left-aligned) + value: Value to display (right-aligned) + label_width: Width for label column (default: 25) + value_width: Width for value column (default: 6) + + Returns: + Formatted string with aligned columns + """ + return f"{label:<{label_width}} {value:>{value_width}}" + + +def calculate_label_width(labels, min_width=25, max_width=50): + """ + Calculate optimal label width for a set of summary labels. + + Args: + labels: List of label strings + min_width: Minimum label width (default: 25) + max_width: Maximum label width (default: 50) + + Returns: + Optimal label width for formatting + """ + if not labels: + return min_width + max_label_len = max(len(label) for label in labels) + return min(max(max_label_len + 1, min_width), max_width) + + +def parse_no_color_flag(args): + """ + Parse --nocolor or --no-color flag from command line arguments. + + Args: + args: List of command line arguments (typically sys.argv) + + Returns: + True if --nocolor or --no-color flag is present, False otherwise + """ + return '--nocolor' in args or '--no-color' in args + + +def format_issue_message( + severity, issue_type, file_path, + *, + line_num=None, + message=None, + suggestion=None, + no_color=False, +): + """ + Format an error or warning message with consistent structure. + + Args: + severity: Either "error" or "warning" (case-insensitive) + issue_type: Short description of the issue type (e.g., "Heading without Req") + file_path: Path to the file (will be converted to string) + line_num: Optional line number + message: Optional additional message/details + suggestion: Optional suggestion for fixing the issue (formatted as " -> {suggestion}") + no_color: If True, disable colors + + Returns: + Formatted error or warning message string with color applied + """ + severity_lower = severity.lower() + if severity_lower not in ('error', 'warning'): + raise ValueError(f"severity must be 'error' or 'warning', got '{severity}'") + + is_error = severity_lower == 'error' + prefix = "ERROR" if is_error else "WARNING" + color_code = COLOR_RED if is_error else COLOR_YELLOW + + file_str = str(file_path) + if line_num is not None: + location = f"{file_str}:{line_num}" + else: + location = file_str + + # Build the message parts + parts = [f"{prefix}: {issue_type}: {location}"] + + if message: + parts.append(message) + + # Build the full message + if len(parts) > 1: + issue_msg = ": ".join(parts) + else: + issue_msg = parts[0] + + # Add suggestion if present (without extra colon since it already has " -> ") + if suggestion: + issue_msg = f"{issue_msg} -> {suggestion}" + + return colorize(issue_msg, color_code, no_color) + + +class OutputBuilder: + """ + Builder for consistent script output formatting. + + Handles headers, summaries, success messages, and spacing automatically. + Tracks line types (error, warning, info) and supports verbose mode filtering. + Automatically orders output sections in the correct sequence. + """ + + # Line type constants + LINE_INFO = 'info' + LINE_ERROR = 'error' + LINE_WARNING = 'warning' + LINE_VERBOSE = 'verbose' + + def __init__(self, title, description, no_color=False, *, output_file=None, verbose=False): + """ + Initialize the output builder. + + Args: + title: Script title for header + description: Brief description for header + no_color: If True, disable colors + output_file: Optional file path to write output to + verbose: If True, include verbose-only lines in output + """ + # Separate sections for automatic ordering + self.header_lines = [] + self.working_verbose_lines = [] # Working/progress verbose output + self.summary_lines = [] + self.warning_lines = [] + self.error_lines = [] + self.final_message_lines = [] # Success messages, etc. + + # Metadata for each section + self.header_metadata = [] + self.working_verbose_metadata = [] + self.summary_metadata = [] + self.warning_metadata = [] + self.error_metadata = [] + self.final_message_metadata = [] + + self.no_color = no_color + self.output_file = output_file + self.verbose = verbose + self._last_was_blank = {} # Track blank lines per section + self._has_warnings = False + self._has_errors = False + self._header_printed = False # Track if header has been streamed + self._streamed_lines = [] # Track streamed lines for file output + self._streamed_header_count = 0 # Number of header lines streamed + self._streamed_verbose_count = 0 # Number of working_verbose lines streamed + self._summary_header_added = False # Track if summary header has been added + self._has_success_message = False # Track if success message has been added + self._has_failure_message = False # Track if failure message has been added + self._has_warnings_only_message = False # Warnings-only final message (no errors) + self._errors_header_added = False # Track if errors header has been added + self._warnings_header_added = False # Track if warnings header has been added + + # Add header immediately (will stream if verbose) + self.add_header(title, description) + + def _add_to_section(self, section, line, line_type=LINE_INFO, verbose_only=False): + """ + Internal method to add a line to a specific section. + + Strips whitespace-only lines (they should be added via add_blank_line instead). + """ + # Strip whitespace-only lines - they should be added via add_blank_line + # Only process non-empty lines (empty lines should use add_blank_line) + if line and line.strip(): + section_lines = getattr(self, f"{section}_lines") + section_metadata = getattr(self, f"{section}_metadata") + section_lines.append(line) + section_metadata.append((line_type, verbose_only)) + self._last_was_blank[section] = False + + def _add_blank_to_section(self, section): + """Internal method to add a blank line to a specific section.""" + if not self._last_was_blank.get(section, False): + section_lines = getattr(self, f"{section}_lines") + section_metadata = getattr(self, f"{section}_metadata") + section_lines.append("") + section_metadata.append((self.LINE_INFO, False)) + self._last_was_blank[section] = True + + def add_header(self, title, description): + """ + Add script header with separators. + + If verbose=True, prints header immediately. Otherwise buffers it. + + Args: + title: Script title + description: Brief description + """ + separator = "=" * SEPARATOR_WIDTH + header_text = f"{title} - {description}" + header_lines = [separator, header_text, separator] + + # Store in header section for final output + for line in header_lines: + self._add_to_section("header", line) + + # If verbose, print header immediately + if self.verbose and not self._header_printed: + for line in header_lines: + print(line) + if self.output_file: + self._streamed_lines.append(line) + self._header_printed = True + # Count header lines that will be in final output (after filtering) + filtered_header = self._filter_section(self.header_lines, self.header_metadata) + self._streamed_header_count = len(filtered_header) + + def add_summary_header(self): + """Add summary section header.""" + if self._summary_header_added: + return # Already added, avoid duplicates + separator = "=" * SEPARATOR_WIDTH + self._add_to_section("summary", separator) + self._add_to_section("summary", "Summary") + self._add_to_section("summary", separator) + self._summary_header_added = True + + def add_summary_section(self, items, label_width=None, value_width=6): + """ + Add summary items with consistent formatting. + + Automatically adds summary header if: + - There are summary items + - AND (verbose is True OR there are warnings OR there are errors) + - AND summary header hasn't been added yet + + Args: + items: List of (label, value) tuples + label_width: Optional label width (auto-calculated if None) + value_width: Value column width (default: 6) + """ + if not items: + return + + # Automatically add summary header if conditions are met + should_show_summary = self.verbose or self._has_warnings or self._has_errors + if should_show_summary and not self._summary_header_added: + self.add_summary_header() + + if label_width is None: + labels = [item[0] for item in items] + label_width = calculate_label_width(labels) + + for label, value in items: + line = format_summary_line(label, value, label_width, value_width) + self._add_to_section("summary", line) + + def add_success_message(self, message): + """ + Add success message with proper spacing. + + Adds: 1 blank line before, message with ✅ prefix, 1 blank line after. + + Args: + message: Success message text (✅ will be automatically prepended) + """ + # Clear other final messages if present (mutually exclusive) + if self._has_failure_message or self._has_warnings_only_message: + self._clear_final_messages() + self._has_success_message = True + self._has_failure_message = False + self._has_warnings_only_message = False + + self._add_blank_to_section("final_message") + # Automatically prepend ✅ if not already present + if not message.startswith("✅ "): + message = f"✅ {message}" + colored_msg = colorize(message, COLOR_GREEN, self.no_color) + self._add_to_section("final_message", colored_msg) + self._add_blank_to_section("final_message") + + def add_failure_message(self, message): + """ + Add failure message with proper spacing. + + Adds: 1 blank line before, message with ❌ prefix, 1 blank line after. + + Args: + message: Failure message text (❌ will be automatically prepended) + """ + # Clear other final messages if present (mutually exclusive) + if self._has_success_message or self._has_warnings_only_message: + self._clear_final_messages() + self._has_success_message = False + self._has_failure_message = True + self._has_warnings_only_message = False + + self._add_blank_to_section("final_message") + # Automatically prepend ❌ if not already present + if not message.startswith("❌ "): + message = f"❌ {message}" + colored_msg = colorize(message, COLOR_RED, self.no_color) + self._add_to_section("final_message", colored_msg) + self._add_blank_to_section("final_message") + + def add_warnings_only_message( + self, + message="Warnings detected. Review the warnings above.", + verbose_hint=None, + ): + """ + Add warnings-only final message (no errors). + + Use when validation passed but there are warnings. Adds: 1 blank before, + message with ⚠️ prefix (yellow), optional verbose hint line, 1 blank after. + + Args: + message: Main message (⚠️ prepended if not present). + verbose_hint: If set and not verbose, add a second line (e.g. run --verbose). + """ + if self._has_success_message or self._has_failure_message: + self._clear_final_messages() + self._has_success_message = False + self._has_failure_message = False + self._has_warnings_only_message = True + + self._add_blank_to_section("final_message") + if not message.startswith("⚠️ "): + message = f"⚠️ {message}" + colored_msg = colorize(message, COLOR_YELLOW, self.no_color) + self._add_to_section("final_message", colored_msg) + if verbose_hint and not self.verbose: + self._add_to_section("final_message", verbose_hint) + self._add_blank_to_section("final_message") + + def add_errors_header(self): + """ + Add errors section header (standardized, like Summary). + + Note: This only adds the header. The header will only be displayed + if there are actual error lines (not just the header itself). + """ + if self._errors_header_added: + return # Already added, avoid duplicates + self._has_errors = True + separator = "=" * SEPARATOR_WIDTH + self._add_to_section("error", separator, line_type=self.LINE_ERROR) + self._add_to_section("error", "Errors", line_type=self.LINE_ERROR) + self._add_to_section("error", separator, line_type=self.LINE_ERROR) + self._errors_header_added = True + + def add_warnings_header(self): + """Add warnings section header (standardized, like Summary).""" + if self._warnings_header_added: + return # Already added, avoid duplicates + self._has_warnings = True + separator = "=" * SEPARATOR_WIDTH + self._add_to_section("warning", separator, line_type=self.LINE_WARNING) + self._add_to_section("warning", "Warnings", line_type=self.LINE_WARNING) + self._add_to_section("warning", separator, line_type=self.LINE_WARNING) + self._warnings_header_added = True + + def add_separator(self, section="summary"): + """ + Add a separator line to the specified section. + + Args: + section: Section to add separator to (default: "summary") + """ + separator = "=" * SEPARATOR_WIDTH + if section == "error": + line_type = self.LINE_ERROR + elif section == "warning": + line_type = self.LINE_WARNING + else: + line_type = self.LINE_INFO + self._add_to_section(section, separator, line_type=line_type) + + def add_line(self, line, line_type=LINE_INFO, verbose_only=False, section="summary"): + """ + Add a raw line to output. + + Args: + line: Line text to add + line_type: Type of line ('info', 'error', 'warning', 'verbose') + verbose_only: If True, only include this line when verbose=True + section: Section to add to ('header', 'working_verbose', 'summary', + 'warning', 'error', 'final_message') + """ + self._add_to_section(section, line, line_type=line_type, verbose_only=verbose_only) + + def add_error_line(self, line, verbose_only=False): + """ + Add an error line to output. + + Automatically adds errors header if not already added. + + Args: + line: Line text to add + verbose_only: If True, only include this line when verbose=True + """ + self._has_errors = True + # Automatically add errors header if not already added + if not self._errors_header_added: + self.add_errors_header() + self._add_to_section("error", line, line_type=self.LINE_ERROR, verbose_only=verbose_only) + + def add_warning_line(self, line, verbose_only=False): + """ + Add a warning line to output. + + Args: + line: Line text to add + verbose_only: If True, only include this line when verbose=True + """ + self._has_warnings = True + # Automatically add warnings header if not already added + if not self._warnings_header_added: + self.add_warnings_header() + self._add_to_section( + "warning", line, line_type=self.LINE_WARNING, verbose_only=verbose_only + ) + + def add_verbose_line(self, line, line_type=LINE_INFO): + """ + Add a verbose-only line to working verbose output section. + + If verbose=True, prints line immediately (after ensuring header is printed). + Otherwise buffers it. + + Args: + line: Line text to add + line_type: Type of line ('info', 'error', 'warning') + """ + # Store in working_verbose section for final output + self._add_to_section("working_verbose", line, line_type=line_type, verbose_only=True) + + # If verbose, print immediately (after header if needed) + if self.verbose: + if not self._header_printed: + # Header hasn't been printed yet, but we're trying to stream + # This shouldn't happen if scripts call add_header first, but handle gracefully + pass + print(line) + if self.output_file: + self._streamed_lines.append(line) + + def add_blank_line(self, section="summary"): + """ + Add a blank line to a specific section. + + If verbose=True and section is "working_verbose", prints immediately. + Otherwise buffers it. + + Args: + section: Section to add blank line to + """ + self._add_blank_to_section(section) + + # If verbose and this is working_verbose, print immediately + if self.verbose and section == "working_verbose" and self._header_printed: + print("") + if self.output_file: + self._streamed_lines.append("") + # Track that we've streamed this blank verbose line + self._streamed_verbose_count += 1 + + def _filter_section(self, section_lines, section_metadata): + """ + Filter a section's lines based on verbose mode. + + Args: + section_lines: List of lines in the section + section_metadata: List of (line_type, verbose_only) tuples + + Returns: + Filtered list of lines + """ + filtered = [] + for line, (_line_type, verbose_only) in zip(section_lines, section_metadata): + if not verbose_only or self.verbose: + filtered.append(line) + return filtered + + def _get_ordered_sections(self): + """ + Get all sections in the correct order with filtering applied. + + Header and summary are only included if verbose=True OR if there are warnings/errors. + + Returns: + List of lines in correct order: header, working_verbose, summary, + warning, error, final_message + """ + all_lines = [] + + # Check if we should show header and summary + show_header_summary = self.verbose or self._has_warnings or self._has_errors + + # 1. Header (only if verbose or has warnings/errors) + if show_header_summary: + all_lines.extend(self._filter_section(self.header_lines, self.header_metadata)) + + # 2. Working verbose output + working_verbose = self._filter_section( + self.working_verbose_lines, self.working_verbose_metadata + ) + if working_verbose: + all_lines.extend(working_verbose) + + # 3. Summary (only if verbose or has warnings/errors) + if show_header_summary: + all_lines.extend(self._filter_section(self.summary_lines, self.summary_metadata)) + + # 4. Warnings (with header if any warnings exist) + warnings = self._filter_section(self.warning_lines, self.warning_metadata) + if warnings: + separator = "=" * SEPARATOR_WIDTH + all_lines.extend(_ensure_section_header(warnings, "Warnings", separator)) + + # 5. Errors (with header if any errors exist, but skip if only headers/separators) + errors = self._filter_section(self.error_lines, self.error_metadata) + non_header_errors = [ + line for line in errors + if line.strip() and line.strip() != "Errors" and + not (line == "=" * SEPARATOR_WIDTH) + ] + if non_header_errors: + separator = "=" * SEPARATOR_WIDTH + all_lines.extend(_ensure_section_header(errors, "Errors", separator)) + + # 6. Final messages + all_lines.extend( + self._filter_section(self.final_message_lines, self.final_message_metadata) + ) + + return all_lines + + def _get_remaining_lines_to_print(self, all_lines: list) -> list: + """ + Return the slice of all_lines that should be printed. + + If verbose and header already printed, returns only summary onward. + Otherwise returns all_lines. + """ + if not (self.verbose and self._header_printed): + return all_lines + separator = "=" * SEPARATOR_WIDTH + filtered_header = self._filter_section(self.header_lines, self.header_metadata) + filtered_verbose = self._filter_section( + self.working_verbose_lines, self.working_verbose_metadata + ) + skip_count = len(filtered_header) + len(filtered_verbose) + summary_start = None + for i, line in enumerate(all_lines): + if (line == separator and i + 1 < len(all_lines) and + all_lines[i + 1].strip() == "Summary"): + summary_start = i + break + if summary_start is not None: + return all_lines[summary_start:] + if 0 < skip_count < len(all_lines): + return all_lines[skip_count:] + if skip_count > 0: + return [] + for i, line in enumerate(all_lines): + if line == separator and i < 3: + continue + has_colon_with_digit = ( + ':' in line and + any(c.isdigit() for c in line.split(':', 1)[-1].strip()) + ) + if has_colon_with_digit or (line == separator and i > 2): + return all_lines[i:] + return [] + + def print(self): + """ + Print all lines to stdout and optionally to file. + + Outputs sections in correct order: header, working_verbose, summary, + warning, error, final_message. + Filters lines based on verbose mode before printing. + + If verbose=True, header and working_verbose have already been streamed, + so only prints summary, warnings, errors, and final messages. + + After printing, clears all sections. + """ + all_lines = self._get_ordered_sections() + if not all_lines: + return + remaining_lines = self._get_remaining_lines_to_print(all_lines) + + if remaining_lines: + # Collapse consecutive blank lines (max 2 consecutive) + collapsed_lines = [] + prev_was_blank = False + for line in remaining_lines: + is_blank = (not line or not line.strip()) + if is_blank: + # Only add blank line if previous line wasn't blank + if not prev_was_blank: + collapsed_lines.append("") + prev_was_blank = True + else: + collapsed_lines.append(line) + prev_was_blank = False + + output_text = "\n".join(collapsed_lines) + output_text += "\n" # Final newline + print(output_text, end="") + + if self.output_file: + # Append collapsed lines to streamed lines for file output + self._streamed_lines.extend(collapsed_lines) + + # Write to file if specified + if self.output_file: + try: + with open(self.output_file, 'w', encoding='utf-8') as f: + # Combine streamed lines and remaining lines, remove color codes + all_file_lines = self._streamed_lines + remaining_lines + if all_file_lines: + file_text = "\n".join(all_file_lines) + file_text += "\n" # Final newline + file_text = re.sub(r'\033\[[0-9;]*m', '', file_text) + f.write(file_text) + except IOError as e: + print( + f"Error: Cannot write to output file {self.output_file}: {e}", + file=sys.stderr + ) + + # Clear all sections + self.header_lines = [] + self.header_metadata = [] + self.working_verbose_lines = [] + self.working_verbose_metadata = [] + self.summary_lines = [] + self.summary_metadata = [] + self.warning_lines = [] + self.warning_metadata = [] + self.error_lines = [] + self.error_metadata = [] + self.final_message_lines = [] + self.final_message_metadata = [] + self._last_was_blank = {} + self._header_printed = False + self._streamed_lines = [] + + def print_preview(self): + """ + Print all current lines to stdout without clearing. + + Intended for showing output before interactive prompts. + """ + all_lines = self._get_ordered_sections() + if not all_lines: + return + output_text = "\n".join(all_lines) + output_text += "\n" + print(output_text, end="") + + def get_lines(self, filter_verbose=True): + """ + Get all lines as a list in correct order (for custom processing). + + Args: + filter_verbose: If True, filter based on verbose mode + + Returns: + List of output lines in correct order + """ + if filter_verbose: + return self._get_ordered_sections() + # If not filtering, combine all sections in order + all_lines = [] + all_lines.extend(self.header_lines) + all_lines.extend(self.working_verbose_lines) + all_lines.extend(self.summary_lines) + all_lines.extend(self.warning_lines) + all_lines.extend(self.error_lines) + all_lines.extend(self.final_message_lines) + return all_lines + + def get_exit_code(self, no_fail=False): + """ + Get the appropriate exit code based on errors found. + + Args: + no_fail: If True, always return 0 (even if errors were found) + + Returns: + 0 if no errors found or no_fail is True, 1 if errors were found + """ + if no_fail: + return 0 + return 0 if not self._has_errors else 1 + + def has_warnings(self) -> bool: + """ + Return True if warnings were recorded. + """ + return self._has_warnings + + def _clear_final_messages(self): + """Clear final message section (used when switching between success/failure).""" + self.final_message_lines = [] + self.final_message_metadata = [] + + def clear(self): + """Clear all accumulated lines from all sections.""" + self.header_lines = [] + self.header_metadata = [] + self.working_verbose_lines = [] + self.working_verbose_metadata = [] + self.summary_lines = [] + self.summary_metadata = [] + self.warning_lines = [] + self.warning_metadata = [] + self.error_lines = [] + self.error_metadata = [] + self.final_message_lines = [] + self.final_message_metadata = [] + self._last_was_blank = {} + self._has_success_message = False + self._has_failure_message = False + self._has_warnings_only_message = False + self._errors_header_added = False + self._warnings_header_added = False + + +class ValidationIssue: + """ + Represents a validation issue found in markdown files. + + This is a shared class used across validation scripts for consistency. + Issues are tracked as List[ValidationIssue] in validation functions. + Use ValidationIssue.create(...) for R0917-friendly construction (≤5 positional). + """ + + @classmethod + def create( + cls, + issue_type: str, + file_path: Path, + start_line: int, + end_line: int, + *, + message: str, + **kwargs + ) -> "ValidationIssue": + """Create a ValidationIssue (avoids too-many-positional-arguments).""" + return cls( + issue_type, file_path, start_line, end_line, + message=message, **kwargs + ) + + def __init__( + self, + issue_type: str, + file_path: Path, + start_line: int, + end_line: int, + *, + message: str, + severity: str = "error", # "error" or "warning" + suggestion: Optional[str] = None, + heading: Optional[str] = None, + **kwargs + ): + """ + Create a ValidationIssue. + + Args: + issue_type: Type of issue (e.g., 'missing_comment', 'heading_format') + file_path: Path to the file (will be converted to string) + start_line: Starting line number + end_line: Ending line number + message: Issue message + severity: "error" or "warning" (default: "error") + suggestion: Optional suggestion for fixing + heading: Optional heading text + **kwargs: Additional type-specific fields (e.g., def_name, def_kind, etc.) + """ + self.issue_type = issue_type + self.file = str(file_path) # Convert Path to string + self.start_line = start_line + self.end_line = end_line + self.message = message + self.severity = severity.lower() # Normalize to lowercase + if self.severity not in ('error', 'warning'): + raise ValueError(f"severity must be 'error' or 'warning', got '{severity}'") + self.suggestion = suggestion + self.heading = heading + self.extra_fields = kwargs # Store additional fields + + def to_dict(self) -> Dict: + """Convert to dictionary for backward compatibility (JSON, reporting, etc.).""" + result = { + 'type': self.issue_type, + 'file': self.file, + 'start_line': self.start_line, + 'end_line': self.end_line, + 'message': self.message, + 'severity': self.severity, + } + if self.suggestion: + result['suggestion'] = self.suggestion + if self.heading: + result['heading'] = self.heading + result.update(self.extra_fields) + return result + + def format_message(self, no_color: bool = False) -> str: + """Format issue message using format_issue_message utility.""" + return format_issue_message( + self.severity, + self.issue_type, + self.file, + line_num=self.start_line, + message=self.message, + suggestion=self.suggestion, + no_color=no_color, + ) + + def matches( + self, + issue_type: Optional[str] = None, + severity: Optional[str] = None + ) -> bool: + """ + Check if this issue matches the given filter criteria. + + Args: + issue_type: Optional issue type to match (exact match) + severity: Optional severity to match (exact match, case-insensitive) + + Returns: + True if the issue matches all provided criteria, False otherwise. + If no criteria are provided, returns True. + """ + if issue_type is not None and self.issue_type != issue_type: + return False + if severity is not None and self.severity != severity.lower(): + return False + return True + + def __repr__(self) -> str: + """String representation for debugging.""" + return ( + f"ValidationIssue(type={self.issue_type!r}, file={self.file!r}, " + f"line={self.start_line}, severity={self.severity!r})" + ) + + def __eq__(self, other) -> bool: + """Equality comparison.""" + if not isinstance(other, ValidationIssue): + return False + return ( + self.issue_type == other.issue_type + and self.file == other.file + and self.start_line == other.start_line + and self.end_line == other.end_line + and self.message == other.message + and self.severity == other.severity + ) diff --git a/scripts/validate_api_go_defs_index.md b/scripts/validate_api_go_defs_index.md index 7398e4dd..6bed5269 100644 --- a/scripts/validate_api_go_defs_index.md +++ b/scripts/validate_api_go_defs_index.md @@ -2,24 +2,40 @@ ## 1. Overview -This document describes the business logic for [validate_api_go_defs_index.py](validate_api_go_defs_index.py). -It is the source of truth for how the Go definitions index is parsed, compared, ordered, reported, and optionally updated. +This document describes the current business logic implemented by [validate_api_go_defs_index.py](validate_api_go_defs_index.py). +If this document and the implementation disagree, the implementation is authoritative. + +The validator scans Go code blocks (` ```go `) in tech specs and ensures that all discovered Go API definitions are present in the Go definitions index. +This is intentionally scoped to Go API definitions only. +Other language code blocks are ignored. +Constants and variables are intentionally excluded. ## 2. Inputs and Outputs Inputs: -- Index file: [docs/tech_specs/api_go_defs_index.md](../docs/tech_specs/api_go_defs_index.md). -- Tech specs directory: [docs/tech_specs](../docs/tech_specs). +- Index file (default): [docs/tech_specs/api_go_defs_index.md](../docs/tech_specs/api_go_defs_index.md). +- Tech specs directory (fixed): [docs/tech_specs](../docs/tech_specs). Outputs: -- Structured report to stdout or to the optional output file. -- Exit code 0 on success, 1 on validation failure unless --no-fail is set. +- Structured output (errors, warnings, summary) to stdout, and optionally to `--output`. +- Exit code is determined by the output builder and is forced to 0 when `--no-fail` is set. +- `--apply` may write the index file, but it does not change the process exit code. + +Command line options: + +- `--verbose` / `-v`: Include verbose details (including placement details and the full expected tree at the end). +- `--index-file`: Override the index file path (relative to the repo root). +- `--output` / `-o FILE`: Write detailed output to `FILE`. +- `--no-color` / `--nocolor`: Disable colored output. +- `--no-fail`: Exit with code 0 even if errors are found. +- `--apply`: Apply high-confidence updates and reordering to the index file (interactive confirmation required). ## 3. ParsedIndex Model -ParsedIndex is produced by [scripts/lib/_index_utils.py](lib/_index_utils.py). +ParsedIndex is produced by [scripts/lib/_index_utils.py](lib/_index_utils.py) via +[scripts/lib/go_defs_index/_go_defs_index_indexfile.py](lib/go_defs_index/_go_defs_index_indexfile.py). It contains: - sections: Map of section path to IndexSection. @@ -28,94 +44,157 @@ It contains: - unsorted_paths: Path list for unsorted types, methods, and functions. - title: Document title from the first H1. +During validation, each section may contain: + +- current_entries: Entries that currently exist in the index file. +- expected_entries: Entries the validator expects based on discovered definitions and placement. + ## 4. Validation Phases Each phase builds on the data prepared by the previous phase. ### 4.1 Discovery -The validator scans all markdown in [docs/tech_specs](../docs/tech_specs) except the index file. -It extracts Go definitions from ` ```go ` code blocks and creates DetectedDefinition objects. -Heading resolution is performed during discovery, so each definition includes canonical file and anchor data. -`api_file_mgmt_errors.md` participates in discovery like other tech specs. +The validator scans all `*.md` files directly under [docs/tech_specs](../docs/tech_specs), excluding the index file itself. +It extracts Go definitions from ` ```go ` code blocks only. +Example code is excluded using heuristics (for example, example headings and single-line example checks). + +Discovered definitions are normalized into `DetectedDefinition` objects. +Discovery supports: + +- Types (including structs and interfaces). +- Methods (receiver methods). +- Functions, including extraction of referenced types and referenced methods from function signatures to help placement. + +Notes and limitations: + +- Some placement behavior is intentionally hard-coded (for example, receiver-specific method categorization rules and receiver normalization rules). +- This means the validator can drift from expectations as the API surface changes. + - Renames or additions to receiver types can change placement confidence and increase unresolved entries. + - New method families or reorganized sections may require updates to the categorization logic to avoid wrong-section suggestions. +- Sorting is deterministic once placement is complete, but placement quality depends on these rules and the current index structure. + +Discovery also resolves canonical references (file, heading, anchor) for each definition. +If a definition name appears in multiple different tech spec files, discovery emits duplicate-definition errors. ### 4.2 Index Parsing -The validator parses the index once using [scripts/lib/_index_utils.py](lib/_index_utils.py). -Each numbered section becomes an IndexSection with current_entries populated from the index file. -Entry description lines are captured and attached to the current IndexEntry objects. +The validator reads the index file and parses it once into a `ParsedIndex`. +Each numbered section becomes an `IndexSection` with `current_entries` populated from the index file. +Entry description text is captured from the index file and attached to `IndexEntry` objects. ### 4.3 Placement -Placement is structure-first and kind-first. -Definitions are placed in this order: types, then methods, then functions. +Placement is kind-first and uses confidence scoring. +Definitions are processed in this order: types, then methods, then functions. +The confidence threshold for high-confidence placement is 0.75. Types: - Types are placed only in type sections. -- Unresolvable types go into unsorted types. +- If no high-confidence section match is found, the type is placed into the unsorted types section with status `unresolved`. Methods: -- Methods are constrained only to the receiver type section's child method subsections. -- Structure-first categorization picks the exact subsection when possible. -- Signature-related Package methods fall back to Package Other Methods when no signature subsection exists. -- Unresolvable methods go into unsorted methods. +- Methods require a receiver type. +- Methods are constrained to the receiver type section's method subsections (including nested method subsections). +- Receiver types may be normalized using implementation-to-interface mappings (for example, `ReadonlyPackage` and `FilePackage` are treated as `Package`). +- Category rules are applied for some receiver types (for example `Package` and `FileEntry`) to prefer structure-first placement when a category subsection exists. +- If no high-confidence match is found, the method is placed into the unsorted methods section with status `unresolved` and a suggested section (if any). Functions: -- Functions are placed using referenced types or methods first. -- If no relationship is found, scoring chooses among helper sections. -- Scoring logic is split into focused modules under `scripts/lib/go_defs_index/`. -- Unresolvable functions go into unsorted functions. +- Functions may be placed into function subsections under a related type section when the function signature references types or methods. +- If no relationship is found, scoring selects among all function sections. +- If no high-confidence match is found, the function is placed into the unsorted functions section with status `unresolved` and a suggested section (if any). ### 4.4 Comparison -Comparison reconciles unsorted expected entries with existing current entries. -It sets entry_status for current and expected entries as added, moved, removed, orphaned, present, or unresolved. -Link updates are detected by comparing expected and current link targets. +Comparison reconciles `expected_entries` against `current_entries` and sets statuses. +Comparison also moves expected entries out of unsorted sections if the same entry already exists in the index (so that section status can be evaluated in the correct section). + +Statuses applied during comparison: + +- For current index entries: + - `orphaned`: The entry is not expected anywhere (it does not match any discovered definition). + - `removed`: The entry exists, but the expected section for that entry is different from the current section. + - `present`: The entry matches the expected section. +- For expected entries: + - `added`: The entry is expected but does not exist in the current index. + - `moved`: The entry exists in the index but in a different section. + - `present`: The entry exists in the correct section. + - `unresolved`: The entry could not be confidently placed (typically in an unsorted section). + +Link update detection is performed by comparing expected and current link targets. +When a mismatch is detected, the current entry is marked as needing a link update and the expected link target is recorded. ### 4.5 Description Validation -Description checks require at least 20 characters of description text per entry. -If a description is missing, def comments from expected entries are used as suggestions. +Description validation enforces: + +- Each indexed entry that is expected must have description text with a minimum length of 20 characters. +- Description text should be unique. + Multiple entries sharing the exact same description are treated as errors. + +When an entry is missing a description, the validator suggests using the definition's doc comments when available. ### 4.6 Ordering -Ordering sorts expected entries within each section. -Sorting uses ParsedIndex.sort_expected_entries with capitals-first, case-insensitive ordering. +Ordering performs two actions: + +- Emits warnings when entries in the current index file are not in the expected alphabetical order. + These are warnings (not errors) and are capped per section. +- Sorts expected entries within each section via `ParsedIndex.sort_expected_entries`. ### 4.7 Reporting -Reporting uses entry_status to list missing, moved, orphaned, and link issues. -Low-confidence entries are reported from unsorted expected entries. -The full expected tree is rendered from ParsedIndex.render_full_tree. +Reporting emits: + +- Missing high-confidence definitions (expected entries with status `added`). +- Orphaned entries (current entries with status `orphaned`). +- Wrong-section entries (expected entries with status `moved`). +- Incorrect link targets (current entries flagged for link updates). +- Low-confidence unresolved definitions (expected entries with status `unresolved` and confidence < 75%). + +When `--verbose` is set, the script also prints the full expected index tree from `ParsedIndex.render_full_tree`. +The final summary includes a breakdown of issue counts, and also includes counts of zero-confidence placements from discovery and placement. ## 5. Apply Switch -The --apply switch overwrites the index file with a full render of the expected structure. -It requires a TTY and an explicit "yes" confirmation before writing. -The regeneration includes the overview and table of contents. -Unsorted sections are never applied and unresolved items are excluded by design. +The `--apply` switch overwrites the index file with a full render of the expected structure. +The regenerated file includes the overview and table of contents. + +Interactive confirmation requirements: + +- The user must type `yes` to confirm. +- If stdin is a TTY, confirmation is read from stdin. +- If stdin is not a TTY, confirmation is read from `/dev/tty` when available. +- If neither is possible, `--apply` exits with code 1 and prints an error explaining that an interactive terminal is required. + +Apply triggers: + +- `--apply` only writes when there are pending high-confidence changes, including description fix candidates. +- If there are no pending changes, it prints `No high-confidence updates to apply.` and does not write. + +Description application behavior: + +- Before writing, the validator syncs expected descriptions and may populate missing expected entry descriptions derived from definition doc comments. +- This allows `--apply` to fix missing descriptions even when there are no structural or link changes, as long as suitable doc comments are available. + +`--apply` is executed after output is printed and does not change the process exit code. ## 6. Related Files - [scripts/validate_api_go_defs_index.py](validate_api_go_defs_index.py) - [scripts/lib/_index_utils.py](lib/_index_utils.py) +- [scripts/lib/go_defs_index/_go_defs_index_discovery.py](lib/go_defs_index/_go_defs_index_discovery.py) +- [scripts/lib/go_defs_index/_go_defs_index_indexfile.py](lib/go_defs_index/_go_defs_index_indexfile.py) - [scripts/lib/go_defs_index/_go_defs_index_matching.py](lib/go_defs_index/_go_defs_index_matching.py) - [scripts/lib/go_defs_index/_go_defs_index_comparison.py](lib/go_defs_index/_go_defs_index_comparison.py) +- [scripts/lib/go_defs_index/_go_defs_index_descriptions.py](lib/go_defs_index/_go_defs_index_descriptions.py) - [scripts/lib/go_defs_index/_go_defs_index_ordering.py](lib/go_defs_index/_go_defs_index_ordering.py) - [scripts/lib/go_defs_index/_go_defs_index_reporting.py](lib/go_defs_index/_go_defs_index_reporting.py) -- [scripts/lib/go_defs_index/_go_defs_index_config.py](lib/go_defs_index/_go_defs_index_config.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring.py](lib/go_defs_index/_go_defs_index_scoring.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring_domain.py](lib/go_defs_index/_go_defs_index_scoring_domain.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring_text.py](lib/go_defs_index/_go_defs_index_scoring_text.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core.py](lib/go_defs_index/_go_defs_index_scoring_rules_core.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_base.py](lib/go_defs_index/_go_defs_index_scoring_rules_core_base.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring_rules_core_domain.py](lib/go_defs_index/_go_defs_index_scoring_rules_core_domain.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring_rules_methods.py](lib/go_defs_index/_go_defs_index_scoring_rules_methods.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring_rules_sections.py](lib/go_defs_index/_go_defs_index_scoring_rules_sections.py) -- [scripts/lib/go_defs_index/_go_defs_index_scoring_rules_penalties.py](lib/go_defs_index/_go_defs_index_scoring_rules_penalties.py) +- [scripts/lib/go_defs_index/_go_defs_index_apply_parsing.py](lib/go_defs_index/_go_defs_index_apply_parsing.py) (used when applying index updates) ## 7. Validation Commands diff --git a/scripts/validate_api_go_defs_index.py b/scripts/validate_api_go_defs_index.py index 24a7a7f5..9e0730b1 100644 --- a/scripts/validate_api_go_defs_index.py +++ b/scripts/validate_api_go_defs_index.py @@ -19,41 +19,26 @@ import os import sys from pathlib import Path +from typing import Dict, Optional + +from lib import _validation_utils +from lib import _index_utils +from lib.go_defs_index import _go_defs_index_discovery as _discovery +from lib.go_defs_index import _go_defs_index_models as _models +from lib.go_defs_index import _go_defs_index_indexfile as _indexfile +from lib.go_defs_index import _go_defs_index_matching as _matching +from lib.go_defs_index import _go_defs_index_comparison as _comparison +from lib.go_defs_index import _go_defs_index_descriptions as _descriptions +from lib.go_defs_index import _go_defs_index_ordering as _ordering +from lib.go_defs_index import _go_defs_index_reporting as _reporting -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" -go_defs_dir = lib_dir / "go_defs_index" - -# Import shared utilities -for module_path in (str(scripts_dir), str(lib_dir), str(go_defs_dir)): - if module_path not in sys.path: - sys.path.insert(0, module_path) - -# Import shared utilities -import lib._validation_utils as _validation_utils # noqa: E402 - -import lib._index_utils as _index_utils # noqa: E402 ParsedIndex = _index_utils.ParsedIndex - -import lib.go_defs_index._go_defs_index_discovery as _discovery # noqa: E402 discover_all_definitions_phase1 = _discovery.discover_all_definitions - - -import lib.go_defs_index._go_defs_index_indexfile as _indexfile # noqa: E402 +DetectedDefinition = _models.DetectedDefinition parse_index_indexfile = _indexfile.parse_index - -import lib.go_defs_index._go_defs_index_matching as _matching # noqa: E402 - -import lib.go_defs_index._go_defs_index_comparison as _comparison # noqa: E402 compare_with_index_phase4 = _comparison.compare_with_index - -import lib.go_defs_index._go_defs_index_descriptions as _descriptions # noqa: E402 check_entry_descriptions_phase5 = _descriptions.check_entry_descriptions - -import lib.go_defs_index._go_defs_index_ordering as _ordering # noqa: E402 determine_ordering_phase6 = _ordering.determine_ordering - -import lib.go_defs_index._go_defs_index_reporting as _reporting # noqa: E402 generate_report_phase7 = _reporting.generate_report INDEX_FILENAME = "api_go_defs_index.md" @@ -450,6 +435,7 @@ def _apply_index_updates( parsed_index.get_removed_entries(), parsed_index.get_orphans(), parsed_index.get_link_update_entries(), + parsed_index.get_reordered_entries(), has_description_fix_candidates, ] ) @@ -521,12 +507,12 @@ def _read_index_file_or_exit( _fatal_validation_issue( output=output, no_fail=no_fail, - issue=ValidationIssue( + issue=ValidationIssue.create( "index_file_read_error", index_file, 0, 0, - f"Could not read index file: {e}", + message=f"Could not read index file: {e}", severity="error", ), ) @@ -534,25 +520,25 @@ def _read_index_file_or_exit( _fatal_validation_issue( output=output, no_fail=no_fail, - issue=ValidationIssue( + issue=ValidationIssue.create( "index_file_decode_error", index_file, 0, 0, - f"Could not decode index file (encoding issue): {e}", + message=f"Could not decode index file (encoding issue): {e}", severity="error", ), ) - except Exception as e: # pylint: disable=broad-exception-caught + except (ValueError, KeyError, TypeError, RuntimeError, MemoryError, BufferError) as e: _fatal_validation_issue( output=output, no_fail=no_fail, - issue=ValidationIssue( + issue=ValidationIssue.create( "index_file_unexpected_error", index_file, 0, 0, - f"Unexpected error reading index file: {e}", + message=f"Unexpected error reading index file: {e}", severity="error", ), ) @@ -572,12 +558,12 @@ def _parse_index_or_exit( _fatal_validation_issue( output=output, no_fail=no_fail, - issue=ValidationIssue( + issue=ValidationIssue.create( "duplicate_headings", index_file, 0, 0, - f"{e}", + message=f"{e}", severity="error", ), ) @@ -590,6 +576,7 @@ def _add_index_summary( definitions_count: int, parsed_index: ParsedIndex, description_errors: int, + zero_confidence_counts: Optional[Dict[str, int]] = None, ) -> int: added_entries = len(parsed_index.get_added_entries()) moved_entries = len(parsed_index.get_moved_entries()) @@ -628,23 +615,51 @@ def _add_index_summary( if summary_items: output.add_summary_header() summary_items.insert(0, ("Total issues:", total_issues)) + if zero_confidence_counts: + summary_items.extend(_format_zero_confidence_summary(zero_confidence_counts)) output.add_summary_section(summary_items) output.add_failure_message("Validation failed. Please fix the errors above.") return total_issues output.add_summary_header() - output.add_summary_section( - [ - ("Definitions checked:", definitions_count), - ("All definitions indexed:", definitions_count), - ] - ) - output.add_success_message( - "No errors or suggestions found. All definitions are correctly indexed." - ) + summary_items = [ + ("Definitions checked:", definitions_count), + ("All definitions indexed:", definitions_count), + ] + if zero_confidence_counts: + summary_items.extend(_format_zero_confidence_summary(zero_confidence_counts)) + output.add_summary_section(summary_items) return 0 +def _format_zero_confidence_summary( + zero_confidence_counts: Dict[str, int], +) -> list[tuple[str, int]]: + summary_items: list[tuple[str, int]] = [] + for label, key in ( + ("Zero-confidence types:", "type"), + ("Zero-confidence functions:", "func"), + ("Zero-confidence methods:", "method"), + ("Zero-confidence total:", "total"), + ): + count = zero_confidence_counts.get(key, 0) + if count: + summary_items.append((label, count)) + return summary_items + + +def _count_zero_confidence(definitions: list[DetectedDefinition]) -> Dict[str, int]: + counts = {"type": 0, "func": 0, "method": 0, "total": 0} + for definition in definitions: + score = definition.confidence_score + if score is None or score > 0.0: + continue + if definition.kind in counts: + counts[definition.kind] += 1 + counts["total"] += 1 + return counts + + def _build_arg_parser() -> argparse.ArgumentParser: parser = argparse.ArgumentParser( description=( @@ -711,12 +726,12 @@ def main() -> None: _fatal_validation_issue( output=output, no_fail=args.no_fail, - issue=ValidationIssue( + issue=ValidationIssue.create( "tech_specs_dir_not_found", tech_specs_dir, 0, 0, - f"Tech specs directory not found: {tech_specs_dir}", + message=f"Tech specs directory not found: {tech_specs_dir}", severity="error", ), ) @@ -725,12 +740,12 @@ def main() -> None: _fatal_validation_issue( output=output, no_fail=args.no_fail, - issue=ValidationIssue( + issue=ValidationIssue.create( "index_file_not_found", index_file, 0, 0, - f"Index file not found: {index_file}", + message=f"Index file not found: {index_file}", severity="error", ), ) @@ -792,15 +807,15 @@ def main() -> None: index_file, output, ) - except Exception as e: + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: if output: output.add_error_line( - ValidationIssue( + ValidationIssue.create( "error_parsing_descriptions", index_file, 0, 0, - f"Could not parse index file for description checking: {e}", + message=f"Could not parse index file for description checking: {e}", severity="error", ).format_message(no_color=output.no_color) ) @@ -820,13 +835,33 @@ def main() -> None: index_file_name, ) - _add_index_summary( + zero_confidence_counts = _count_zero_confidence(definitions) + total_issues = _add_index_summary( output=output, definitions_count=len(definitions), parsed_index=parsed_index, description_errors=description_errors, + zero_confidence_counts=zero_confidence_counts, ) + if output.verbose: + output.add_blank_line("final_message") + output.add_line("Expected index (full tree):", section="final_message") + output.add_blank_line("final_message") + for line in parsed_index.render_full_tree(): + output.add_line(line, section="final_message") + output.add_blank_line("final_message") + + if not total_issues: + if output.has_warnings(): + output.add_warnings_only_message( + verbose_hint="Run with --verbose to see the full warning details.", + ) + else: + output.add_success_message( + "No errors or suggestions found. All definitions are correctly indexed." + ) + final_exit_code = output.get_exit_code(args.no_fail) output.print() if args.apply: diff --git a/scripts/validate_go_code_blocks.py b/scripts/validate_go_code_blocks.py index f4956a81..7f4f3861 100644 --- a/scripts/validate_go_code_blocks.py +++ b/scripts/validate_go_code_blocks.py @@ -7,13 +7,10 @@ 2. Each Go code block has at most one function or method definition 3. Type definitions and function definitions are mutually exclusive in a code block 4. Each Go code block is under a different heading -5. Function headings should NOT include the function name in backticks - (e.g., NewPackage, not `NewPackage`) -6. Method headings should NOT include type and method name in backticks - (e.g., FileEntry.GetProcessingState, not `FileEntry.GetProcessingState`) -7. Type/Interface/Struct headings should NOT include the type name in backticks - (e.g., Package, not `Package`) -8. All type, interface, struct, function, and method definitions have +5. Function/Method/Type headings should include the definition name and kind word; + definition names are preferred in backticks (e.g. `` `Package.Write` Method ``). + Case inside backticks is ignored for validation. +6. All type, interface, struct, function, and method definitions have comments preceding them Usage: @@ -49,28 +46,26 @@ docs/tech_specs/api_file_management.md,docs/tech_specs/api_core.md """ +import re import sys from pathlib import Path from collections import defaultdict from typing import List, Tuple, Dict, Optional -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" - -# Import shared utilities -for module_path in (str(scripts_dir), str(lib_dir)): - if module_path not in sys.path: - sys.path.insert(0, module_path) - -from lib._validation_utils import ( # noqa: E402 +from lib._validation_utils import ( OutputBuilder, parse_no_color_flag, find_markdown_files, parse_paths, get_validation_exit_code, - find_heading_for_code_block, format_issue_message, - has_backticks, get_backticks_error_message, + find_heading_for_code_block, + remove_backticks_keep_content, ValidationIssue, DOCS_DIR, TECH_SPECS_DIR ) -from lib._go_code_utils import ( # noqa: E402 +from lib._validate_go_code_blocks_report import ( + generate_report, + print_summary, + _has_non_warning_errors, +) +from lib._go_code_utils import ( find_go_code_blocks as find_go_code_blocks_base, is_example_code, is_example_signature_name, @@ -82,7 +77,7 @@ ) -def find_go_code_blocks(content: str, file_path: str) -> List[Tuple[int, int, str, Optional[str]]]: +def find_go_code_blocks(content: str, _file_path: str) -> List[Tuple[int, int, str, Optional[str]]]: """ Find all Go code blocks in markdown content with heading context. @@ -117,7 +112,7 @@ def has_preceding_comment(code_lines: List[str], def_line_index: int) -> bool: Returns: True if there's a comment preceding the definition, False otherwise """ - if def_line_index == 0: + if not def_line_index: # First line of code block, no preceding comment possible return False @@ -187,6 +182,7 @@ def validate_single_definition( start_line: int, end_line: int, lines: List[str], + *, heading: Optional[str], is_type: bool, file_path: Path, @@ -206,8 +202,8 @@ def validate_single_definition( # Check for missing comment check_missing_comment( - code, code_lines, start_line, lines, heading, - is_type, file_path, issues + code, code_lines, start_line, lines, + heading=heading, is_type=is_type, file_path=file_path, issues=issues ) # Validate heading format @@ -229,6 +225,10 @@ def validate_single_definition( is_valid, error_messages, suggestion = validate_heading_format( heading, name=def_name, kind=def_kind, receiver_type=receiver_type ) + search_term = ( + f'{receiver_type}.{def_name}' if def_kind == 'method' else def_name + ) + kind_word = def_kind.capitalize() if not is_valid: for error_msg in error_messages: extra_fields = {} @@ -238,18 +238,39 @@ def validate_single_definition( else: extra_fields['func_name'] = def_name extra_fields['receiver_type'] = receiver_type - extra_fields['is_method'] = (def_kind == 'method') - issues.append(ValidationIssue( + extra_fields['is_method'] = def_kind == 'method' + issues.append(ValidationIssue.create( 'heading_format', file_path, start_line, end_line, - error_msg, + message=error_msg, severity='error', suggestion=suggestion, heading=heading, **extra_fields )) + elif not _heading_has_name_in_backticks(heading, search_term): + suggested_heading = suggest_heading(heading, search_term, kind_word) + extra_fields = {} + if is_type: + extra_fields['type_name'] = def_name + extra_fields['kind'] = kind + else: + extra_fields['func_name'] = def_name + extra_fields['receiver_type'] = receiver_type + extra_fields['is_method'] = def_kind == 'method' + issues.append(ValidationIssue.create( + 'heading_prefer_backticks', + file_path, + start_line, + end_line, + message='Prefer backticks for definition name in heading', + severity='warning', + suggestion=f'Suggested: {suggested_heading}', + heading=heading, + **extra_fields + )) def check_missing_comment( @@ -257,6 +278,7 @@ def check_missing_comment( code_lines: List[str], start_line: int, lines: List[str], + *, heading: Optional[str], is_type: bool, file_path: Path, @@ -293,13 +315,15 @@ def check_missing_comment( # Check for comment if not has_preceding_comment(code_lines, def_line_idx): def_line_num = start_line + def_line_idx - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( 'missing_comment', file_path, def_line_num, def_line_num, - f'{def_kind.capitalize()} definition `{def_display}` ' - 'does not have a preceding comment', + message=( + f'{def_kind.capitalize()} definition `{def_display}` ' + 'does not have a preceding comment' + ), severity='error', suggestion=( f'Add a comment before the {def_kind} definition ' @@ -311,24 +335,40 @@ def check_missing_comment( )) +def _heading_contains_name(heading: str, search_term: str) -> bool: + """ + Return True if heading contains the definition name (case-insensitive). + Strips backticks for comparison so `` `Package.Write` `` matches Package.Write. + """ + normalized = remove_backticks_keep_content(heading) + return search_term.lower() in normalized.lower() + + +def _heading_has_name_in_backticks(heading: str, search_term: str) -> bool: + """ + Return True if heading contains the definition name inside backticks. + Used to recommend backticks when the heading is valid but plain. + """ + pattern = re.compile(r'`' + re.escape(search_term) + r'`', re.IGNORECASE) + return pattern.search(heading) is not None + + def suggest_heading(heading: str, search_term: str, kind_word: str) -> str: """ Suggest a corrected heading for a definition. + Prefers definition name in backticks: `` `Name` Kind remaining ``. - Normalizes the heading to the format: [number] search_term kind_word remaining_text - by removing backticks and preserving the numbering prefix (if present). + Preserves the numbering prefix if present. Args: heading: Current heading text (may include numbering like "2.5 Heading Text") - search_term: The term to search for (e.g., "Package", "FileEntry.GetState", "NewPackage") + search_term: The term to use (e.g., "Package", "FileEntry.GetState", "NewPackage") kind_word: The kind word to include (e.g., "Method", "Function", "Struct", "Interface") Returns: - Suggested heading in format: {number} {search_term} {kind_word} {remaining_text} + Suggested heading in format: {number} `{search_term}` {kind_word} {remaining} (number is preserved if present in original heading) """ - import re - # Pattern to match numbered headings: "2.5" or "2.5.3" etc. numbered_pattern = re.compile(r'^([0-9]+(?:\.[0-9]+)*)\.?\s+(.+)$') @@ -340,29 +380,28 @@ def suggest_heading(heading: str, search_term: str, kind_word: str) -> str: numbering_prefix = match.group(1) heading_without_number = match.group(2) - # Step 1: Remove all backticks from heading text (not from numbering) - heading_without_number = heading_without_number.replace('`', '') + # Normalize (strip backticks) for extracting remaining text + normalized = remove_backticks_keep_content(heading_without_number) - # Step 2: Remove search_term (case-insensitive) + # Remove search_term (case-insensitive) if '.' in search_term: - # Method format - exact match pattern = re.compile(re.escape(search_term), re.IGNORECASE) else: - # Simple name - use word boundaries pattern = re.compile(r'\b' + re.escape(search_term) + r'\b', re.IGNORECASE) - heading_without_number = pattern.sub('', heading_without_number, count=1) + normalized = pattern.sub('', normalized, count=1) - # Step 3: Remove kind_word (case-insensitive, whole word) + # Remove kind_word (case-insensitive, whole word) kind_pattern = re.compile(r'\b' + re.escape(kind_word) + r'\b', re.IGNORECASE) - heading_without_number = kind_pattern.sub('', heading_without_number, count=1) + normalized = kind_pattern.sub('', normalized, count=1) + # Also remove common long form (e.g. "Structure" when kind is "Struct") + if kind_word == 'Struct': + normalized = re.sub(r'\bStructure\b', '', normalized, count=1, flags=re.IGNORECASE) - # Step 4: Clean up extra whitespace - remaining = ' '.join(heading_without_number.split()) + remaining = ' '.join(normalized.split()) - # Step 5: Build the suggested heading - suggested = f"{search_term} {kind_word} {remaining}".strip() + # Prefer definition name in backticks + suggested = f"`{search_term}` {kind_word} {remaining}".strip() - # Step 6: Add numbering prefix if it was present if numbering_prefix: suggested = f"{numbering_prefix} {suggested}" @@ -382,13 +421,14 @@ def validate_heading_format( structs, and other type definitions. Rules: - - Functions: Heading should include function name NOT in backticks (e.g., NewPackage) - and should include "Function" immediately after the name - - Methods: Heading should include type and method name in format - Type.MethodName NOT in backticks (e.g., FileEntry.GetProcessingState) - and should include "Method" immediately after Type.MethodName - - Types/Interfaces/Structs: Heading should include type name NOT in backticks (e.g., Package) - and should include the kind word ("Interface", "Struct", or "Type") immediately after the name + - Definition name may appear in backticks (preferred) or plain; case inside + backticks is ignored for validation. + - Functions: Heading should include function name and "Function" + (e.g. `` `NewPackage` Function ``). + - Methods: Heading should include Type.MethodName and "Method" + (e.g. `` `Package.Write` Method ``). + - Types/Interfaces/Structs: Heading should include type name and kind word + (e.g. `` `Package` Struct ``). Args: heading: The heading text (required) @@ -406,45 +446,44 @@ def validate_heading_format( errors = [] # Determine expected kind word by capitalizing the kind - # This preserves the original kind (e.g., 'alias' -> 'Alias', not 'Type') kind_word = kind.capitalize() - # Check for backticks in heading (common check for all cases) - if has_backticks(heading): - errors.append(get_backticks_error_message()) - # Determine search term and display name based on kind if kind == 'method' and receiver_type: search_term = f'{receiver_type}.{name}' - display_name = search_term + display_name = f'`{search_term}`' else: search_term = name - display_name = name + display_name = f'`{name}`' - # Check if search term is present in heading (common for all cases except methods) - if kind != 'method' and name not in heading: + # Check if definition name is present (case-insensitive; backticks stripped for comparison) + if kind != 'method' and not _heading_contains_name(heading, search_term): errors.append( - f'Heading should include {kind.capitalize()} name: {name}' + f'Heading should include {kind.capitalize()} name: {name} (prefer in backticks)' ) match kind: case 'method': if receiver_type: - if search_term not in heading: + if not _heading_contains_name(heading, search_term): errors.append( - f'Method heading should include {display_name} (without backticks)' + f'Method heading should include {display_name} and {kind_word}' ) else: check_kind_word_after( - heading, search_term, kind_word, display_name, - 'Method heading', errors + heading, search_term, kind_word, + display_name=display_name, + error_prefix='Method heading', + errors=errors ) case _: # Default: types and functions - if name in heading: + if _heading_contains_name(heading, search_term): check_kind_word_after( - heading, search_term, kind_word, display_name, - f'{kind.capitalize()} heading', errors + heading, search_term, kind_word, + display_name=display_name, + error_prefix=f'{kind.capitalize()} heading', + errors=errors ) # Generate suggestion if there are errors (common logic for all cases) @@ -459,7 +498,87 @@ def validate_heading_format( suggested_heading = suggest_heading(heading, search_term, kind_word) suggestion = f'Suggested: {suggested_heading}' - return (len(errors) == 0, errors, suggestion) + return (not errors, errors, suggestion) + + +def _append_block_count_issues( + issues: List[ValidationIssue], + file_path: Path, + start_line: int, + end_line: int, + heading: Optional[str], + *, + type_count: int, + func_count: int, + func_type_count: int, +) -> None: + """Append validation issues for block definition counts (multiple types/funcs, etc.).""" + if type_count > 1: + issues.append(ValidationIssue.create( + 'multiple_types', file_path, start_line, end_line, + message=f'Code block has {type_count} type/interface definitions (max 1 allowed)', + severity='error', + suggestion='Split into separate code blocks, one per type/interface definition', + heading=heading, type_count=type_count, + )) + if func_count > 1: + issues.append(ValidationIssue.create( + 'multiple_funcs', file_path, start_line, end_line, + message=f'Code block has {func_count} func definitions (max 1 allowed)', + severity='error', + suggestion='Split into separate code blocks, one per function/method definition', + heading=heading, func_count=func_count, + )) + if type_count > 0 and func_count > 0: + issues.append(ValidationIssue.create( + 'type_func_exclusive', file_path, start_line, end_line, + message=( + f'Code block has both {type_count} type definition(s) and ' + f'{func_count} func definition(s) (must be exclusive)' + ), + severity='error', + suggestion=( + 'Separate type definitions and function definitions into different code blocks' + ), + heading=heading, type_count=type_count, func_count=func_count, + )) + if func_type_count > 0: + issues.append(ValidationIssue.create( + 'function_type_warning', file_path, start_line, end_line, + message=( + f'Code block has {func_type_count} function type definition(s) ' + '(review recommended)' + ), + severity='warning', + suggestion=( + 'Review if function type definitions are intentional and properly documented' + ), + heading=heading, func_type_count=func_type_count, + )) + + +def _append_heading_usage_issues( + issues: List[ValidationIssue], + file_path: Path, + heading_usage: Dict[str, List[tuple]], +) -> None: + """Append issues for multiple blocks per heading.""" + for heading, blocks_under_heading in heading_usage.items(): + if len(blocks_under_heading) <= 1: + continue + first_block_start = min(b[0] for b in blocks_under_heading) + last_block_end = max(b[1] for b in blocks_under_heading) + issues.append(ValidationIssue.create( + 'multiple_blocks_per_heading', + file_path, first_block_start, last_block_end, + message=( + f'Heading "{heading}" has {len(blocks_under_heading)} ' + 'Go code blocks (each should be under a different heading)' + ), + severity='error', + suggestion='Move each code block to a separate heading', + heading=heading, blocks=blocks_under_heading, + )) def audit_file(file_path: Path) -> Dict: @@ -471,8 +590,6 @@ def audit_file(file_path: Path) -> Dict: content = file_path.read_text(encoding='utf-8') lines = content.split('\n') blocks = find_go_code_blocks(content, str(file_path)) - - # Track headings used by code blocks heading_usage = defaultdict(list) for start_line, end_line, code, heading in blocks: @@ -480,8 +597,6 @@ def audit_file(file_path: Path) -> Dict: type_count = counts['type'] func_count = counts['func'] + counts['method'] func_type_count = counts['func_type'] - - # Cache code_lines once per code block to avoid repeated splits code_lines = code.split('\n') code_blocks.append({ @@ -494,163 +609,55 @@ def audit_file(file_path: Path) -> Dict: 'code_preview': code[:100] + '...' if len(code) > 100 else code }) - # Check if this is example code (skip validation for example code blocks) - is_example = is_example_code( + if is_example_code( code, start_line, - lines=lines, - heading_text=heading, - check_prose_before_block=True - ) - - # Skip all validation checks if this is example code - if is_example: - # Track heading usage even for example code + lines=lines, heading_text=heading, check_prose_before_block=True + ): if heading: heading_usage[heading].append((start_line, end_line)) continue - # Check: at most one type/interface definition - if type_count > 1: - issues.append(ValidationIssue( - 'multiple_types', - file_path, - start_line, - end_line, - f'Code block has {type_count} type/interface definitions ' - '(max 1 allowed)', - severity='error', - suggestion=( - 'Split into separate code blocks, ' - 'one per type/interface definition' - ), - heading=heading, - type_count=type_count - )) - - # Check: at most one function/method definition - if func_count > 1: - issues.append(ValidationIssue( - 'multiple_funcs', - file_path, - start_line, - end_line, - f'Code block has {func_count} func definitions ' - '(max 1 allowed)', - severity='error', - suggestion=( - 'Split into separate code blocks, ' - 'one per function/method definition' - ), - heading=heading, - func_count=func_count - )) - - # Check: type and func definitions are mutually exclusive - if type_count > 0 and func_count > 0: - issues.append(ValidationIssue( - 'type_func_exclusive', - file_path, - start_line, - end_line, - f'Code block has both {type_count} type definition(s) and ' - f'{func_count} func definition(s) (must be exclusive)', - severity='error', - suggestion=( - 'Separate type definitions and function definitions ' - 'into different code blocks' - ), - heading=heading, - type_count=type_count, - func_count=func_count - )) - - # Check: function types (warnings for review) - if func_type_count > 0: - issues.append(ValidationIssue( - 'function_type_warning', - file_path, - start_line, - end_line, - f'Code block has {func_type_count} function type definition(s) ' - '(review recommended)', - severity='warning', - suggestion=( - 'Review if function type definitions are intentional ' - 'and properly documented' - ), - heading=heading, - func_type_count=func_type_count - )) - - # Check: type definitions have preceding comments + _append_block_count_issues( + issues, file_path, start_line, end_line, heading, + type_count=type_count, func_count=func_count, func_type_count=func_type_count, + ) if type_count > 0: check_missing_comment( - code, code_lines, start_line, lines, heading, - is_type=True, file_path=file_path, issues=issues + code, code_lines, start_line, lines, + heading=heading, is_type=True, file_path=file_path, issues=issues ) - - # Check: function/method definitions have preceding comments if func_count > 0: check_missing_comment( - code, code_lines, start_line, lines, heading, - is_type=False, file_path=file_path, issues=issues + code, code_lines, start_line, lines, + heading=heading, is_type=False, file_path=file_path, issues=issues ) - - # Check: heading format and comments for single definitions - # Unified validation for both types and functions - if (type_count == 1 and func_count == 0) or (func_count == 1 and type_count == 0): - is_type = type_count == 1 + if (type_count == 1 and not func_count) or (func_count == 1 and not type_count): validate_single_definition( code, code_lines, start_line, end_line, lines, - heading, is_type, file_path, issues + heading=heading, is_type=(type_count == 1), + file_path=file_path, issues=issues ) - - # Track heading usage if heading: heading_usage[heading].append((start_line, end_line)) - # Check: each code block should be under a different heading - for heading, blocks_under_heading in heading_usage.items(): - if len(blocks_under_heading) > 1: - # Calculate line range: first block start to last block end - first_block_start = min(block[0] for block in blocks_under_heading) - last_block_end = max(block[1] for block in blocks_under_heading) - - issues.append(ValidationIssue( - 'multiple_blocks_per_heading', - file_path, - first_block_start, - last_block_end, - ( - f'Heading "{heading}" has {len(blocks_under_heading)} ' - 'Go code blocks (each should be under a different heading)' - ), - severity='error', - suggestion='Move each code block to a separate heading', - heading=heading, - blocks=blocks_under_heading - )) - - # Check for code blocks without headings + _append_heading_usage_issues(issues, file_path, dict(heading_usage)) for block in code_blocks: if block['heading'] is None: - issues.append(ValidationIssue( - 'no_heading', - file_path, - block['start_line'], - block['end_line'], - 'Code block is not under any heading', + issues.append(ValidationIssue.create( + 'no_heading', file_path, + block['start_line'], block['end_line'], + message='Code block is not under any heading', severity='error', suggestion='Add a heading above the code block' )) - except Exception as e: - issues.append(ValidationIssue( + except (OSError, ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: + issues.append(ValidationIssue.create( 'error', file_path, 1, 1, - f'Error reading file: {e}', + message=f'Error reading file: {e}', severity='error' )) @@ -661,300 +668,6 @@ def audit_file(file_path: Path) -> Dict: } -def generate_report(results: List[Dict], output_path: Path) -> None: - """Generate markdown report from audit results.""" - report_lines = [] - - report_lines.append('# Go Code Blocks Validation Report') - report_lines.append('') - report_lines.append('This report validates all Go code blocks in the tech specs documentation.') - report_lines.append('') - report_lines.append('## Requirements') - report_lines.append('') - report_lines.append( - '1. Each Go code block should have at most one type or interface ' - 'definition' - ) - report_lines.append( - '2. Each Go code block should have at most one function or method ' - 'definition' - ) - report_lines.append( - '3. Type definitions and function definitions are mutually exclusive ' - 'in a code block' - ) - report_lines.append('4. Each Go code block should be under a different heading') - report_lines.append( - '5. Function headings should NOT include the function name in backticks ' - '(e.g., NewPackage, not `NewPackage`)' - ) - report_lines.append( - '6. Method headings should NOT include type and method name in backticks ' - '(e.g., FileEntry.GetProcessingState, not `FileEntry.GetProcessingState`)' - ) - report_lines.append( - '7. Type/Interface/Struct headings should NOT include the type name in backticks ' - '(e.g., Package, not `Package`)' - ) - report_lines.append( - '8. All type, interface, struct, function, and method definitions should have ' - 'comments preceding them' - ) - report_lines.append('') - - # Summary - total_files = len(results) - total_blocks = sum(len(r['code_blocks']) for r in results) - total_issues = sum(len(r['issues']) for r in results) - - report_lines.append('## Summary') - report_lines.append('') - report_lines.append(f'- Files audited: {total_files}') - report_lines.append(f'- Total Go code blocks found: {total_blocks}') - report_lines.append(f'- Total issues found: {total_issues}') - report_lines.append('') - - if total_issues == 0: - report_lines.append('✅ All Go code blocks comply with the requirements!') - report_lines.append('') - else: - report_lines.append('## Issues Found') - report_lines.append('') - - # Group issues by type - issues_by_type = defaultdict(list) - for result in results: - for issue in result['issues']: - # Convert ValidationIssue to dict if needed - if isinstance(issue, ValidationIssue): - issue = issue.to_dict() - issues_by_type[issue['type']].append((result['file'], issue)) - - for issue_type, issues in sorted(issues_by_type.items()): - report_lines.append(f'### {issue_type.replace("_", " ").title()} Issues') - report_lines.append('') - - for file_path, issue in issues: - report_lines.append(f'**File:** `{file_path}`') - if 'start_line' in issue: - report_lines.append(f'**Lines:** {issue["start_line"]}-{issue["end_line"]}') - if 'heading' in issue: - report_lines.append(f'**Heading:** {issue["heading"]}') - if 'type_count' in issue: - report_lines.append(f'**Type definitions found:** {issue["type_count"]}') - if 'func_count' in issue: - report_lines.append(f'**Func definitions found:** {issue["func_count"]}') - if 'func_type_count' in issue: - report_lines.append( - f'**Function type definitions found:** ' - f'{issue["func_type_count"]}' - ) - if 'blocks' in issue: - block_info = ', '.join(f'lines {s}-{e}' for s, e in issue['blocks']) - report_lines.append(f'**Code blocks:** {block_info}') - if 'def_name' in issue: - report_lines.append(f'**Definition:** {issue["def_name"]}') - if 'def_kind' in issue: - report_lines.append(f'**Definition kind:** {issue["def_kind"]}') - report_lines.append(f'**Issue:** {issue["message"]}') - report_lines.append('') - - # Detailed file-by-file breakdown - report_lines.append('## Detailed File Breakdown') - report_lines.append('') - - for result in sorted(results, key=lambda x: x['file']): - if result['code_blocks'] or result['issues']: - file_name = Path(result["file"]).stem - report_lines.append(f'### {file_name}') - report_lines.append('') - report_lines.append(f'**File path:** `{result["file"]}`') - report_lines.append(f'**Code blocks:** {len(result["code_blocks"])}') - report_lines.append(f'**Issues:** {len(result["issues"])}') - report_lines.append('') - - if result['code_blocks']: - report_lines.append(f'#### {file_name} Code Blocks') - report_lines.append('') - for i, block in enumerate(result['code_blocks'], 1): - report_lines.append( - f'Code block {i}: Lines ' - f'{block["start_line"]}-{block["end_line"]}' - ) - report_lines.append('') - report_lines.append(f'- Heading: {block["heading"] or "(none)"}') - report_lines.append(f'- Type definitions: {block["type_count"]}') - report_lines.append(f'- Func definitions: {block["func_count"]}') - if block.get("func_type_count", 0) > 0: - report_lines.append( - f'- Function type definitions: ' - f'{block["func_type_count"]}' - ) - report_lines.append('') - - if result['issues']: - report_lines.append(f'#### {file_name} Issues') - report_lines.append('') - for issue in result['issues']: - report_lines.append(f'- {issue["message"]}') - if 'start_line' in issue: - report_lines.append( - f' - Lines: {issue["start_line"]}-' - f'{issue["end_line"]}' - ) - report_lines.append('') - - # Write report - output_path.parent.mkdir(parents=True, exist_ok=True) - output_path.write_text('\n'.join(report_lines), encoding='utf-8') - - -def print_summary(results, output=None, verbose=False, no_color=False): - """ - Print summary of audit results. - - Args: - results: List of audit results - output: Optional OutputBuilder instance (creates new one if None) - verbose: Verbose mode flag - no_color: Disable colors flag - """ - if output is None: - output = OutputBuilder(no_color=no_color, verbose=verbose) - # If creating new output, add header - output.add_header("Go Code Blocks Validation", - "Validates Go code blocks in tech specs") - - total_files = len(results) - total_blocks = sum(len(r['code_blocks']) for r in results) - total_issues = sum(len(r['issues']) for r in results) - - # Summary section - output.add_summary_header() - summary_items = [ - ("Files audited:", total_files), - ("Total code blocks:", total_blocks), - ("Total issues found:", total_issues), - ] - output.add_summary_section(summary_items) - - # Group issues by type (includes all types: duplicate_heading, multiple_funcs, etc.) - issues_by_type = defaultdict(int) - for result in results: - for issue in result['issues']: - # Convert ValidationIssue to get type - if isinstance(issue, ValidationIssue): - issue_type = issue.issue_type - else: - issue_type = issue.get('type', 'unknown') - issues_by_type[issue_type] += 1 - - # Print breakdown of all issue types found (non-verbose summary) - if issues_by_type: - output.add_blank_line("summary") - output.add_line('Breakdown by issue type:', section="summary") - breakdown_items = [ - (issue_type.replace('_', ' ').title() + ':', count) - for issue_type, count in sorted(issues_by_type.items()) - ] - output.add_summary_section(breakdown_items) - - # Group issues by type for display - uses format_issue_message - # Errors are always shown, warnings are verbose-only - if issues_by_type: - # Only add errors header if there are errors (not just warnings) - has_errors = any( - (isinstance(issue, ValidationIssue) and - issue.issue_type != "function_type_warning") or - (not isinstance(issue, ValidationIssue) and - issue.get('type') != "function_type_warning") - for result in results - for issue in result['issues'] - ) - if has_errors: - output.add_errors_header() - output.add_blank_line("error") - - issues_by_type_list = defaultdict(list) - for result in results: - for issue in result['issues']: - # Convert ValidationIssue to get type - if isinstance(issue, ValidationIssue): - issue_type = issue.issue_type - issue_dict = issue.to_dict() - else: - issue_type = issue.get('type', 'unknown') - issue_dict = issue - issues_by_type_list[issue_type].append((result['file'], issue_dict)) - - for issue_type, issues in sorted(issues_by_type_list.items()): - # Determine severity based on issue type - severity = "warning" if issue_type == "function_type_warning" else "error" - - for file_path, issue in issues: - # Build message details - message_parts = [] - if 'heading' in issue: - message_parts.append(f'Heading: {issue["heading"]}') - if 'type_count' in issue: - message_parts.append(f'Type definitions: {issue["type_count"]}') - if 'func_count' in issue: - message_parts.append(f'Func definitions: {issue["func_count"]}') - if 'func_type_count' in issue: - message_parts.append(f'Function type definitions: {issue["func_type_count"]}') - if 'func_name' in issue: - message_parts.append(f'Function/Method: {issue["func_name"]}') - if 'receiver_type' in issue and issue['receiver_type']: - message_parts.append(f'Receiver: {issue["receiver_type"]}') - if 'type_name' in issue: - message_parts.append(f'Type: {issue["type_name"]}') - if 'kind' in issue: - message_parts.append(f'Kind: {issue["kind"]}') - if 'def_name' in issue: - message_parts.append(f'Definition: {issue["def_name"]}') - if 'def_kind' in issue: - message_parts.append(f'Definition kind: {issue["def_kind"]}') - if 'blocks' in issue: - block_info = ', '.join(f'lines {s}-{e}' for s, e in issue['blocks']) - message_parts.append(f'Code blocks: {block_info}') - - # Use the main message, append details if any - message = issue.get('message', '') - if message_parts: - message = f"{message} ({', '.join(message_parts)})" - - # Format issue type name - issue_type_name = issue_type.replace('_', ' ').title() - - # Use format_issue_message - line_num = issue.get('start_line') - suggestion = issue.get('suggestion') - formatted_msg = format_issue_message( - severity=severity, - issue_type=issue_type_name, - file_path=file_path, - line_num=line_num, - message=message, - suggestion=suggestion, - no_color=no_color - ) - - # Add to appropriate output section - # Errors are always shown, warnings are verbose-only - if severity == "warning": - output.add_warning_line(formatted_msg, verbose_only=True) - else: - output.add_error_line(formatted_msg, verbose_only=False) - - # Add final message (mutually exclusive) - if total_issues == 0: - output.add_success_message("All Go code blocks comply with the requirements!") - else: - output.add_failure_message("Validation failed. Please fix the errors above.") - - return output - - def main(): """Main entry point.""" @@ -1022,9 +735,8 @@ def main(): # Print all output at once output.print() - # Exit with error code if issues found (unless --no-fail is set) - total_issues = sum(len(r['issues']) for r in results) - has_errors = total_issues > 0 + # Exit with error code only if non-warning issues found (unless --no-fail is set) + has_errors = _has_non_warning_errors(results) return get_validation_exit_code(has_errors, no_fail) diff --git a/scripts/validate_go_signature_sync.py b/scripts/validate_go_signature_sync.py index 5698e46a..242b89cd 100644 --- a/scripts/validate_go_signature_sync.py +++ b/scripts/validate_go_signature_sync.py @@ -18,34 +18,36 @@ --specs-dir DIR Directory containing tech specs (default: docs/tech_specs) --impl-dir DIR Directory containing Go implementation (default: api/go) --output, -o FILE Output file path for validation report (default: stdout) + --no-color, --nocolor Disable colored output --help, -h Show this help message """ import argparse +import functools import re +import shutil +import subprocess # nosec B404 import sys from pathlib import Path -from typing import Dict, List, Optional, Set, Tuple +from typing import Dict, List, Set, Tuple -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" - -# Import shared utilities -for module_path in (str(scripts_dir), str(lib_dir)): - if module_path not in sys.path: - sys.path.insert(0, module_path) - -from lib._validation_utils import ( # noqa: E402 +from lib._validation_utils import ( OutputBuilder, get_workspace_root, parse_no_color_flag, ValidationIssue, DOCS_DIR, TECH_SPECS_DIR ) -from lib._go_code_utils import ( # noqa: E402 +from lib._go_code_utils import ( parse_go_def_signature, find_go_code_blocks, Signature, normalize_go_signature_with_params, extract_interfaces_from_go_file, extract_interfaces_from_markdown ) +from lib._validate_go_signature_sync_helpers import ( + emit_extra_in_impl_section, + emit_mismatches, + emit_missing_in_impl, + emit_sync_final, +) _EMPTY_INTERFACE_RE = re.compile(r'\binterface\s*\{\s*\}') @@ -64,8 +66,189 @@ r'^\s*type\s+(\w+)\s*\[[^\]]+\]\s*=\s+[\w.]+\.(\w+)\s*\[' ) -# Module-level cache for re-exported types from novuspack.go -_REEXPORTED_TYPES: Optional[Set[str]] = None + +def _go_list_public_package_dirs(impl_dir: Path) -> List[Path]: + """ + Return directories of packages that are part of the Go toolchain's public surface. + + Uses `go list ./...` from impl_dir so that: + - Directories Go ignores (e.g. starting with `_`) are excluded. + - Only importable packages are considered. + Excludes packages whose import path contains `/internal/` so that the + validator only reports on the externally visible API surface. + + Returns: + List of absolute Paths to package directories. + """ + impl_dir_abs = impl_dir.resolve() + go_path = shutil.which('go') + if not go_path: + print( + f"Warning: 'go' not found in PATH (impl_dir={impl_dir_abs})", + file=sys.stderr, + ) + return [] + try: + result = subprocess.run( # nosec B603 + [go_path, 'list', '-f', '{{.ImportPath}} {{.Dir}}', './...'], + cwd=str(impl_dir_abs), + capture_output=True, + text=True, + check=False, + timeout=60, + ) + except (FileNotFoundError, subprocess.TimeoutExpired) as e: + print( + f"Warning: could not run 'go list ./...' in {impl_dir_abs}: {e}", + file=sys.stderr, + ) + return [] + if result.returncode: + print( + f"Warning: 'go list ./...' failed in {impl_dir_abs}: {result.stderr}", + file=sys.stderr, + ) + return [] + dirs: List[Path] = [] + for line in result.stdout.strip().splitlines(): + line = line.strip() + if not line: + continue + parts = line.split(' ', 1) + if len(parts) != 2: + continue + import_path, pkg_dir = parts[0], parts[1] + if '/internal/' in import_path or import_path.endswith('/internal'): + continue + dirs.append(Path(pkg_dir).resolve()) + return dirs + + +def _is_exported_receiver(sig: Signature) -> bool: + """ + Return True if the signature is a function (no receiver) or a method on an exported type. + + Methods on unexported receiver types are not part of the public API surface. + """ + if not sig.receiver: + return True + receiver_type = sig.receiver.strip('*').strip() + if not receiver_type or '[' in receiver_type: + base = receiver_type.split('[')[0].strip() + return bool(base and base[0].isupper()) + return receiver_type[0].isupper() + + +def _is_public_surface_path(file_path: Path, impl_dir: Path) -> bool: + """ + Return False if the file is under internal/ or a directory whose name starts with _. + + Matches Go visibility: internal packages and _-prefixed dirs (e.g. _bdd) are not public. + """ + try: + rel = file_path.resolve().relative_to(impl_dir.resolve()) + except ValueError: + return True + parts = rel.parts + for part in parts: + if part == 'internal' or part.startswith('_'): + return False + return True + + +def _gather_go_files( + impl_dir: Path, public_dirs: List[Path], verbose: bool +) -> List[Path]: + """ + Build list of non-test Go files from public package dirs or fallback to rglob. + + When go list succeeds, only files in public_dirs are used (Go toolchain view). + When go list fails, fallback rglob still excludes internal/ and _-prefixed dirs + so visibility matches Go semantics as much as possible without go list. + """ + if public_dirs: + go_files = [] + for pkg_dir in public_dirs: + for f in pkg_dir.glob('*.go'): + if not f.name.endswith('_test.go'): + go_files.append(f) + if verbose: + print( + f"Scanning {len(go_files)} Go files in {len(public_dirs)} public package(s)..." + ) + return go_files + go_files = list(impl_dir.rglob('*.go')) + go_files = [ + f for f in go_files + if not f.name.endswith('_test.go') and _is_public_surface_path(f, impl_dir) + ] + if verbose: + msg = ( + "Warning: 'go list' returned no public packages; " + f"scanning {len(go_files)} Go files (excluding internal/ and _* dirs)." + ) + print(msg) + elif go_files: + print( + "Warning: 'go list ./...' failed or returned no packages; " + "scanning Go files excluding internal/ and _* directories.", + file=sys.stderr, + ) + return go_files + + +def _process_one_impl_sig( + sig: Signature, + signatures: Dict[str, Signature], + issues: List[ValidationIssue], + verbose: bool, +) -> None: + """Append issues for empty interface/any; add to signatures if public and exported.""" + if signature_has_empty_interface_input(sig): + file_path, line_num = parse_location(sig.location) + issues.append(ValidationIssue.create( + "forbidden_empty_interface", + file_path, + line_num, + line_num, + message=( + f"forbidden empty interface parameter type found in " + f"implementation signature '{sig.normalized_key()}' at {sig.location}: " + f"{sig.normalized_signature()}" + ), + severity='error', + signature_key=sig.normalized_key(), + location=sig.location + )) + if signature_uses_any_type(sig): + file_path, line_num = parse_location(sig.location) + issues.append(ValidationIssue.create( + "discouraged_any_type", + file_path, + line_num, + line_num, + message=( + f"discouraged any type usage found in " + f"implementation signature '{sig.normalized_key()}' at {sig.location}: " + f"{sig.normalized_signature()}" + ), + severity='warning', + signature_key=sig.normalized_key(), + location=sig.location + )) + if sig.is_public and _is_exported_receiver(sig): + key = sig.normalized_key() + if key in signatures: + if verbose: + print(f" Warning: Duplicate signature {key} at {sig.location}") + else: + signatures[key] = sig + + +@functools.lru_cache(maxsize=1) +def _cached_reexported_types(repo_root_str: str) -> Set[str]: + """Return re-exported types from novuspack.go (cached by repo root).""" + return extract_reexported_types_from_novuspack(Path(repo_root_str)) def signature_has_empty_interface_input(sig: Signature) -> bool: @@ -145,7 +328,7 @@ def extract_signatures_from_go_file(file_path: Path) -> List[Signature]: is_public=sig.is_public )) - except Exception as e: + except (MemoryError, RuntimeError, BufferError) as e: print(f"Warning: Error reading {file_path}: {e}") return signatures @@ -171,7 +354,7 @@ def extract_signatures_from_markdown_file(file_path: Path) -> List[Signature]: # Extract other signatures (functions, methods, types) that aren't interfaces go_blocks = find_go_code_blocks(content) - for start_line, end_line, code_content in go_blocks: + for start_line, _end_line, code_content in go_blocks: block_lines = code_content.split('\n') for i, line in enumerate(block_lines): @@ -213,7 +396,7 @@ def extract_signatures_from_markdown_file(file_path: Path) -> List[Signature]: except UnicodeDecodeError as e: # Encoding errors - log to stderr print(f"Warning: Could not decode {file_path} (encoding issue): {e}", file=sys.stderr) - except Exception as e: + except (MemoryError, RuntimeError, BufferError) as e: # Unexpected errors - log to stderr print(f"Warning: Unexpected error reading {file_path}: {e}", file=sys.stderr) @@ -235,60 +418,24 @@ def parse_location(location_str: str) -> Tuple[Path, int]: def collect_go_signatures( impl_dir: Path, verbose: bool = False ) -> Tuple[Dict[str, Signature], List[ValidationIssue]]: - """Collect all public signatures from Go implementation files.""" + """ + Collect public signatures from Go implementation files. + + Only scans packages that are part of the Go toolchain's importable surface + (via `go list ./...`), excluding `internal/` packages, and only counts + methods on exported receiver types as public. + """ signatures = {} issues: List[ValidationIssue] = [] - - # Find all .go files, excluding test files - go_files = list(impl_dir.rglob('*.go')) - go_files = [f for f in go_files if not f.name.endswith('_test.go')] - - if verbose: - print(f"Scanning {len(go_files)} Go files...") - + public_dirs = _go_list_public_package_dirs(impl_dir) + go_files = _gather_go_files(impl_dir, public_dirs, verbose) + if verbose and public_dirs and not go_files: + print(f" (No non-test .go files in {len(public_dirs)} package dirs)") for go_file in go_files: if verbose: print(f" Reading {go_file.relative_to(impl_dir)}") - - file_sigs = extract_signatures_from_go_file(go_file) - for sig in file_sigs: - if signature_has_empty_interface_input(sig): - file_path, line_num = parse_location(sig.location) - issues.append(ValidationIssue( - "forbidden_empty_interface", - file_path, - line_num, - line_num, - f"forbidden empty interface parameter type found in " - f"implementation signature '{sig.normalized_key()}' at {sig.location}: " - f"{sig.normalized_signature()}", - severity='error', - signature_key=sig.normalized_key(), - location=sig.location - )) - if signature_uses_any_type(sig): - file_path, line_num = parse_location(sig.location) - issues.append(ValidationIssue( - "discouraged_any_type", - file_path, - line_num, - line_num, - f"discouraged any type usage found in " - f"implementation signature '{sig.normalized_key()}' at {sig.location}: " - f"{sig.normalized_signature()}", - severity='warning', - signature_key=sig.normalized_key(), - location=sig.location - )) - if sig.is_public: # Only track public signatures - key = sig.normalized_key() - if key in signatures: - # Duplicate - keep the first one, but note the conflict - if verbose: - print(f" Warning: Duplicate signature {key} at {sig.location}") - else: - signatures[key] = sig - + for sig in extract_signatures_from_go_file(go_file): + _process_one_impl_sig(sig, signatures, issues, verbose) return signatures, issues @@ -313,28 +460,32 @@ def collect_spec_signatures( for sig in file_sigs: if signature_has_empty_interface_input(sig): file_path, line_num = parse_location(sig.location) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "forbidden_empty_interface", file_path, line_num, line_num, - f"forbidden empty interface parameter type found in " - f"spec signature '{sig.normalized_key()}' at {sig.location}: " - f"{sig.normalized_signature()}", + message=( + f"forbidden empty interface parameter type found in " + f"spec signature '{sig.normalized_key()}' at {sig.location}: " + f"{sig.normalized_signature()}" + ), severity='error', signature_key=sig.normalized_key(), location=sig.location )) if signature_uses_any_type(sig): file_path, line_num = parse_location(sig.location) - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( "discouraged_any_type", file_path, line_num, line_num, - f"discouraged any type usage found in " - f"spec signature '{sig.normalized_key()}' at {sig.location}: " - f"{sig.normalized_signature()}", + message=( + f"discouraged any type usage found in " + f"spec signature '{sig.normalized_key()}' at {sig.location}: " + f"{sig.normalized_signature()}" + ), severity='warning', signature_key=sig.normalized_key(), location=sig.location @@ -360,15 +511,9 @@ def get_public_api_types() -> List[str]: Returns: List of public API type names """ - global _REEXPORTED_TYPES - - # If we haven't loaded re-exported types yet, try to load them - if _REEXPORTED_TYPES is None: - repo_root = get_workspace_root() - _REEXPORTED_TYPES = extract_reexported_types_from_novuspack(repo_root) - - # Start with re-exported types - public_types = list(_REEXPORTED_TYPES) if _REEXPORTED_TYPES else [] + repo_root = get_workspace_root() + reexported = _cached_reexported_types(str(repo_root)) + public_types = list(reexported) if reexported else [] # Add fallback types that might not be in novuspack.go but are still public API fallback_types = [ @@ -472,7 +617,7 @@ def extract_reexported_types_from_novuspack(root_dir: Path) -> Set[str]: except UnicodeDecodeError as e: # Encoding errors - log to stderr print(f"Warning: Could not decode novuspack.go (encoding issue): {e}", file=sys.stderr) - except Exception as e: + except (MemoryError, RuntimeError, BufferError) as e: # Unexpected errors - log to stderr print( f"Warning: Unexpected error parsing novuspack.go for re-exports: {e}", @@ -482,68 +627,29 @@ def extract_reexported_types_from_novuspack(root_dir: Path) -> Set[str]: return reexported_types -def is_high_confidence_helper(sig: Signature) -> Tuple[bool, List[str]]: - """ - Determine if a signature is a high-confidence helper function. +def _is_public_api_method(sig: Signature) -> bool: + """Return True if sig is a method on a known public API type.""" + if not sig.receiver: + return False + receiver = sig.receiver.strip('*') + if not receiver or not receiver[0].isupper(): + return False + public_api_types = get_public_api_types() + receiver_lower = receiver.lower() + for api_type in public_api_types: + if receiver_lower == api_type.lower(): + return True + if '[' in receiver: + base_type = receiver.split('[')[0].strip() + return bool(base_type and base_type[0].isupper()) + return False - This function is conservative: methods on public API types require - very strong evidence to be considered helpers. - Returns: - - (is_helper, reasons): Tuple of boolean and list of reason strings - """ - reasons = [] +def _helper_score_from_path( + file_name: str, parent_dir: str, location_lower: str, reasons: List[str] +) -> int: + """Add path-based helper score and reasons; return score delta.""" score = 0 - - # Parse file path from location (format: "path/to/file.go:123") - location_parts = sig.location.split(':') - file_path_str = location_parts[0] - file_path = Path(file_path_str) - file_name = file_path.name.lower() - parent_dir = file_path.parent.name.lower() if file_path.parent != file_path else "" - - # Check if file is a test file (*_test.go) - # This is the strongest indicator - test files are always helpers - if file_name.endswith('_test.go'): - reasons.append("in test file (*_test.go)") - # Test files are always helpers, return early - return True, reasons - - # Check if this is a method on a public API type - # Public API types are those that start with capital letter and are - # likely part of the documented API surface - is_public_api_method = False - if sig.receiver: - receiver = sig.receiver.strip('*') # Remove pointer indicator - # Check if receiver is a public type (starts with capital) - if receiver and receiver[0].isupper(): - # Public API types are determined by what's re-exported in novuspack.go - # This ensures we only flag methods on truly public API types as errors - public_api_types = get_public_api_types() - # Check if receiver matches a known API type (case-insensitive) - receiver_lower = receiver.lower() - for api_type in public_api_types: - if receiver_lower == api_type.lower(): - is_public_api_method = True - break - # Also check if receiver looks like a generic type (contains [) - if '[' in receiver: - base_type = receiver.split('[')[0].strip() - if base_type and base_type[0].isupper(): - is_public_api_method = True - - # If this is a method on a public API type, require much stronger evidence - # to be considered a helper - if is_public_api_method: - # Methods on public API types need score >= 6 to be considered helpers - # (vs normal threshold of 3) - helper_threshold = 6 - reasons.append("method on public API type (requires stronger evidence)") - else: - # Normal threshold for non-API methods - helper_threshold = 3 - - # Check filename patterns if 'helper' in file_name: score += 3 reasons.append("filename contains 'helper'") @@ -551,20 +657,14 @@ def is_high_confidence_helper(sig: Signature) -> Tuple[bool, List[str]]: score += 3 reasons.append("filename contains 'internal'") if 'test' in file_name and not file_name.endswith('_test.go'): - # Test in filename but not a test file pattern score += 2 reasons.append("filename contains 'test'") - - # Check parent directory patterns if 'helper' in parent_dir: score += 3 reasons.append("parent directory contains 'helper'") if 'internal' in parent_dir: score += 3 reasons.append("parent directory contains 'internal'") - - # Check package name in full file path - location_lower = sig.location.lower() if '/internal/' in location_lower: score += 3 reasons.append("in internal package path") @@ -577,31 +677,59 @@ def is_high_confidence_helper(sig: Signature) -> Tuple[bool, List[str]]: if '/_bdd/' in location_lower: score += 2 reasons.append("in BDD test package") + return score - # Check function/method name patterns - name_lower = sig.name.lower() + +def _helper_score_from_name_receiver(sig: Signature, reasons: List[str]) -> int: + """Add name/receiver-based helper score and reasons; return score delta.""" helper_keywords = ['helper', 'internal', 'util', 'test', 'mock', 'stub'] + score = 0 + name_lower = sig.name.lower() for keyword in helper_keywords: if keyword in name_lower: score += 2 reasons.append(f"name contains '{keyword}'") - - # Check receiver type (for methods) if sig.receiver: receiver_lower = sig.receiver.lower() for keyword in helper_keywords: if keyword in receiver_lower: score += 2 reasons.append(f"receiver contains '{keyword}'") - # Check for test-related receiver types if receiver_lower.startswith('test') or receiver_lower.endswith('test'): score += 2 reasons.append("receiver is test-related type") + return score + + +def is_high_confidence_helper(sig: Signature) -> Tuple[bool, List[str]]: + """ + Determine if a signature is a high-confidence helper function. + + This function is conservative: methods on public API types require + very strong evidence to be considered helpers. - # Use threshold based on whether this is a public API method - is_helper = score >= helper_threshold + Returns: + - (is_helper, reasons): Tuple of boolean and list of reason strings + """ + reasons: List[str] = [] + location_parts = sig.location.split(':') + file_path = Path(location_parts[0]) + file_name = file_path.name.lower() + parent_dir = file_path.parent.name.lower() if file_path.parent != file_path else "" + if file_name.endswith('_test.go'): + reasons.append("in test file (*_test.go)") + return True, reasons - return is_helper, reasons + is_public_api_method = _is_public_api_method(sig) + helper_threshold = 6 if is_public_api_method else 3 + if is_public_api_method: + reasons.append("method on public API type (requires stronger evidence)") + + score = _helper_score_from_path( + file_name, parent_dir, sig.location.lower(), reasons + ) + score += _helper_score_from_name_receiver(sig, reasons) + return (score >= helper_threshold, reasons) def compare_signatures( @@ -641,91 +769,124 @@ def compare_signatures( return mismatches, missing_in_impl, extra_in_impl +def _is_public_api_receiver(receiver: str, public_api_types: List[str]) -> bool: + """Return True if receiver is a known public API type (or generic of one).""" + if not receiver or not receiver[0].isupper(): + return False + receiver_lower = receiver.lower() + for api_type in public_api_types: + if receiver_lower == api_type.lower(): + return True + if '[' in receiver: + base_type = receiver.split('[')[0].strip() + if base_type and base_type[0].isupper(): + base_lower = base_type.lower() + for api_type in public_api_types: + if base_lower == api_type.lower(): + return True + return False + + +def _classify_extra_in_impl( + extra_in_impl: List[str], + impl_sigs: Dict[str, Signature], + public_api_types: List[str], +) -> Tuple[ + List[Tuple[str, Signature, List[str]]], + List[str], + List[Tuple[str, Signature]], +]: + """Classify extra implementation keys into helpers, low-confidence, or API errors.""" + high_confidence_helpers: List[Tuple[str, Signature, List[str]]] = [] + low_confidence_extra: List[str] = [] + errors_public_api_missing: List[Tuple[str, Signature]] = [] + for key in extra_in_impl: + impl_sig = impl_sigs[key] + is_helper, reasons = is_high_confidence_helper(impl_sig) + if is_helper: + high_confidence_helpers.append((key, impl_sig, reasons)) + elif impl_sig.receiver and _is_public_api_receiver( + impl_sig.receiver.strip('*'), public_api_types + ): + errors_public_api_missing.append((key, impl_sig)) + else: + low_confidence_extra.append(key) + return high_confidence_helpers, low_confidence_extra, errors_public_api_missing + + def main(): - parser = argparse.ArgumentParser( - description='Check Go signatures against tech specs' - ) - parser.add_argument( - '--verbose', '-v', - action='store_true', - help='Show detailed progress information' - ) - parser.add_argument( - '--specs-dir', - type=str, - default=f'{DOCS_DIR}/{TECH_SPECS_DIR}', - help=f'Directory containing tech specs (default: {DOCS_DIR}/{TECH_SPECS_DIR})' - ) + """Entry point: parse args, check paths, run validation.""" + parser = argparse.ArgumentParser(description='Check Go signatures against tech specs') + parser.add_argument('--verbose', '-v', action='store_true', help='Show detailed progress') parser.add_argument( - '--impl-dir', - type=str, - default='api/go', - help='Directory containing Go implementation (default: api/go)' + '--specs-dir', type=str, default=f'{DOCS_DIR}/{TECH_SPECS_DIR}', + help=f'Directory containing tech specs (default: {DOCS_DIR}/{TECH_SPECS_DIR})', ) parser.add_argument( - '--output', '-o', - type=str, - help='Output file path for validation report (default: stdout)' + '--impl-dir', type=str, default='api/go', + help='Directory containing Go implementation (default: api/go)', ) + parser.add_argument('--output', '-o', type=str, help='Output file path') parser.add_argument( - '--no-fail', + '--no-color', + '--nocolor', action='store_true', - help='Exit with code 0 even if errors are found' + help='Disable colored output', ) - + parser.add_argument('--no-fail', action='store_true', help='Exit 0 even if errors found') args = parser.parse_args() - # Determine paths repo_root = get_workspace_root() specs_dir = repo_root / args.specs_dir impl_dir = repo_root / args.impl_dir - if not specs_dir.exists(): print(f"Error: Specs directory not found: {specs_dir}", file=sys.stderr) sys.exit(1) - if not impl_dir.exists(): print(f"Error: Implementation directory not found: {impl_dir}", file=sys.stderr) sys.exit(1) - # Create output builder (header streams immediately if verbose) no_color = getattr(args, 'nocolor', False) or parse_no_color_flag(sys.argv) output = OutputBuilder( "Go Signature Sync Validation", "Checks Go signatures in implementation against tech specs", no_color=no_color, verbose=args.verbose, - output_file=args.output + output_file=args.output, ) + _main_run(args, specs_dir, impl_dir, output, no_color) - # Collect signatures + +def _main_run( + args, + specs_dir: Path, + impl_dir: Path, + output: OutputBuilder, + no_color: bool, +) -> None: + """Run validation after paths and output are set (collect, compare, emit).""" if args.verbose: output.add_verbose_line("Collecting signatures from Go implementation...") impl_sigs, impl_issues = collect_go_signatures(impl_dir, args.verbose) - if args.verbose: output.add_verbose_line(f"Found {len(impl_sigs)} public signatures in implementation") output.add_blank_line("working_verbose") output.add_verbose_line("Collecting signatures from tech specs...") - spec_sigs, spec_issues = collect_spec_signatures( - specs_dir, args.verbose - ) - + spec_sigs, spec_issues = collect_spec_signatures(specs_dir, args.verbose) if args.verbose: output.add_verbose_line(f"Found {len(spec_sigs)} public signatures in specs") output.add_blank_line("working_verbose") output.add_verbose_line("Comparing signatures...") - # Combine all issues and filter in a single loop all_issues = impl_issues + spec_issues - empty_interface_errors = [] - any_type_warnings = [] - for issue in all_issues: - if issue.matches(issue_type='forbidden_empty_interface', severity='error'): - empty_interface_errors.append(issue) - if issue.matches(issue_type='discouraged_any_type', severity='warning'): - any_type_warnings.append(issue) - + empty_interface_errors = [ + i for i in all_issues + if i.matches(issue_type='forbidden_empty_interface', severity='error') + ] + any_type_warnings = [ + i for i in all_issues + if i.matches(issue_type='discouraged_any_type', severity='warning') + ] if empty_interface_errors: for error in empty_interface_errors: output.add_error_line(error.format_message(no_color=no_color)) @@ -735,247 +896,56 @@ def main(): output.add_warnings_header() output.add_line( f"Found {len(any_type_warnings)} signature(s) using discouraged any type.", - section="warning" + section="warning", ) if args.verbose: - for warning in any_type_warnings: - output.add_warning_line(warning.format_message(no_color=no_color)) + for w in any_type_warnings: + output.add_warning_line(w.format_message(no_color=no_color)) else: - max_show = 25 - for warning in any_type_warnings[:max_show]: - output.add_warning_line(warning.format_message(no_color=no_color)) - suppressed = len(any_type_warnings) - max_show - if suppressed > 0: + for w in any_type_warnings[:25]: + output.add_warning_line(w.format_message(no_color=no_color)) + if len(any_type_warnings) > 25: output.add_warning_line( - f"{suppressed} additional warning(s) suppressed. " + f"{len(any_type_warnings) - 25} additional warning(s) suppressed. " "Re-run with --verbose to see all." ) - # Compare mismatches, missing_in_impl, extra_in_impl = compare_signatures(impl_sigs, spec_sigs) - - # Report results - has_errors = False - has_warnings = False - - # Errors: Mismatched signatures - if mismatches: - has_errors = True - output.add_errors_header() - output.add_line( - f"Found {len(mismatches)} signature mismatch(es):", - section="error" - ) - output.add_blank_line("error") - - for key, impl_sig, spec_sig in sorted(mismatches): - output.add_error_line(f"Signature: {key}") - output.add_error_line(f" Implementation: {impl_sig.normalized_signature()}") - output.add_error_line(f" Location: {impl_sig.location}") - output.add_error_line(f" Specification: {spec_sig.normalized_signature()}") - output.add_error_line(f" Location: {spec_sig.location}") - - # Warnings: Missing in implementation - if missing_in_impl: - has_warnings = True - output.add_warnings_header() - output.add_line( - f"Found {len(missing_in_impl)} signature(s) in specs " - f"but not in implementation:", - section="warning" - ) - output.add_blank_line("warning") - - for key in sorted(missing_in_impl): - spec_sig = spec_sigs[key] - output.add_warning_line(f" {key}") - output.add_warning_line(f" Signature: {spec_sig.normalized_signature()}") - output.add_warning_line(f" Location: {spec_sig.location}") - - # Errors/Warnings: Extra in implementation (filter out high-confidence helpers) - if extra_in_impl: - high_confidence_helpers = [] - low_confidence_extra = [] - errors_public_api_missing = [] - - for key in extra_in_impl: - impl_sig = impl_sigs[key] - is_helper, reasons = is_high_confidence_helper(impl_sig) - if is_helper: - high_confidence_helpers.append((key, impl_sig, reasons)) - else: - # Check if this is a public method on a public API type - # These should be errors, not warnings - is_public_api_error = False - if impl_sig.receiver: - receiver = impl_sig.receiver.strip('*') - if receiver and receiver[0].isupper(): - # Known public API types - public_api_types = [ - 'FileEntry', 'Package', 'PackageReader', 'PackageWriter', - 'ConfigBuilder', 'Tag', 'Option', 'PathMetadataEntry', - 'FileStream', 'BufferPool', 'ErrorType', 'PackageError', - 'FileInfo', 'PathInfo', 'AddFileOptions', 'ExtractPathOptions', - 'RemoveDirectoryOptions', 'ProcessingState', 'FileSource', - 'TransformPipeline', 'TransformStage', 'SignatureInfo', - 'SigningKey', 'CompressionStrategy', 'EncryptionStrategy' - ] - receiver_lower = receiver.lower() - for api_type in public_api_types: - if receiver_lower == api_type.lower(): - is_public_api_error = True - break - # Also check generic types - if '[' in receiver: - base_type = receiver.split('[')[0].strip() - if base_type and base_type[0].isupper(): - base_lower = base_type.lower() - for api_type in public_api_types: - if base_lower == api_type.lower(): - is_public_api_error = True - break - - if is_public_api_error: - errors_public_api_missing.append((key, impl_sig)) - else: - low_confidence_extra.append(key) - - # Report suppressed helpers - if high_confidence_helpers: - if args.verbose: - output.add_verbose_line( - f"Suppressed {len(high_confidence_helpers)} high-confidence " - f"helper function(s)" - ) - output.add_blank_line("working_verbose") - for key, impl_sig, reasons in sorted(high_confidence_helpers): - output.add_verbose_line(f" {key}") - output.add_verbose_line(f" Signature: {impl_sig.normalized_signature()}") - output.add_verbose_line(f" Location: {impl_sig.location}") - output.add_verbose_line(f" Reasons: {', '.join(reasons)}") - - # Report errors: Public API methods missing from specs - if errors_public_api_missing: - has_errors = True - output.add_errors_header() - output.add_line( - f"Found {len(errors_public_api_missing)} public API method(s) " - f"in implementation but not in specs:", - section="error" - ) - output.add_blank_line("error") - output.add_line( - "These are public methods on public API types and MUST be documented " - "in tech specs.", - section="error" - ) - output.add_blank_line("error") - - for key, impl_sig in sorted(errors_public_api_missing): - output.add_error_line(f"Signature: {key}") - output.add_error_line(f" Implementation: {impl_sig.normalized_signature()}") - output.add_error_line(f" Location: {impl_sig.location}") - - # Report low-confidence extra functions - if low_confidence_extra: - has_warnings = True - output.add_warnings_header() - output.add_line( - f"Found {len(low_confidence_extra)} signature(s) in " - f"implementation but not in specs:", - section="warning" - ) - if high_confidence_helpers: - if args.verbose: - output.add_line( - f"(Suppressed {len(high_confidence_helpers)} high-confidence " - f"helper function(s) - see above)", - section="warning" - ) - else: - output.add_line( - f"(Suppressed {len(high_confidence_helpers)} high-confidence " - f"helper function(s) - use --verbose to see them)", - section="warning" - ) - if errors_public_api_missing: - output.add_line( - f"(Also found {len(errors_public_api_missing)} public API method(s) " - f"missing from specs - see errors above)", - section="warning" - ) - output.add_blank_line("warning") - output.add_line( - "(These may be helper functions, but should be checked)", - section="warning" - ) - output.add_blank_line("warning") - - for key in sorted(low_confidence_extra): - impl_sig = impl_sigs[key] - output.add_warning_line(f" {key}") - output.add_warning_line(f" Signature: {impl_sig.normalized_signature()}") - output.add_warning_line(f" Location: {impl_sig.location}") - elif high_confidence_helpers and not args.verbose: - # Only helpers, no low-confidence extras, and not verbose - output.add_verbose_line( - f"Suppressed {len(high_confidence_helpers)} high-confidence " - f"helper function(s)" - ) - output.add_verbose_line("(Use --verbose to see the list of suppressed helpers)") - - # Summary - if not has_errors and not has_warnings: - output.add_success_message("All signatures are in sync!") - if args.verbose: - output.add_verbose_line(f" - {len(impl_sigs)} signatures in implementation") - output.add_verbose_line(f" - {len(spec_sigs)} signatures in specs") - output.print() - sys.exit(0) - else: - summary_parts = [] - if mismatches: - summary_parts.append(f"{len(mismatches)} mismatch(es)") - if missing_in_impl: - summary_parts.append(f"{len(missing_in_impl)} missing in implementation") - if extra_in_impl: - # Count only low-confidence extras in summary - low_confidence_count = sum( - 1 for key in extra_in_impl - if not is_high_confidence_helper(impl_sigs[key])[0] - ) - high_confidence_count = len(extra_in_impl) - low_confidence_count - if low_confidence_count > 0: - summary_parts.append( - f"{low_confidence_count} extra in implementation" - ) - if high_confidence_count > 0: - summary_parts.append( - f"{high_confidence_count} helper(s) suppressed" - ) - - # Convert summary_parts to (label, value) format - summary_items = [] - for part in summary_parts: - # Extract count and label from part (e.g., "5 mismatch(es)" -> ("Mismatch(es):", 5)) - import re as re_module - match = re_module.search(r'(\d+)\s+(.+)', part) - if match: - count = int(match.group(1)) - label = match.group(2).strip() - # Capitalize first letter - label = label[0].upper() + label[1:] if label else label - summary_items.append((f"{label}:", count)) - - if summary_items: - output.add_summary_section(summary_items) - - output.add_failure_message("Validation failed. Please fix the errors above.") - output.print() - - # Exit with error code if there are actual errors (mismatches) - exit_code = output.get_exit_code(args.no_fail) - sys.exit(exit_code) + has_errors = emit_mismatches(output, mismatches, no_color=no_color) + has_warnings = emit_missing_in_impl( + output, missing_in_impl, spec_sigs, no_color=no_color + ) + public_api_types = get_public_api_types() + high_confidence_helpers, low_confidence_extra, errors_public_api_missing = ( + _classify_extra_in_impl(extra_in_impl, impl_sigs, public_api_types) + ) + he_extra, hw_extra = emit_extra_in_impl_section( + output, args, + extra_in_impl=extra_in_impl, + impl_sigs=impl_sigs, + high_confidence_helpers=high_confidence_helpers, + low_confidence_extra=low_confidence_extra, + errors_public_api_missing=errors_public_api_missing, + no_color=no_color, + ) + has_errors = has_errors or he_extra + has_warnings = has_warnings or hw_extra + low_confidence_count = sum( + 1 for key in extra_in_impl if not is_high_confidence_helper(impl_sigs[key])[0] + ) + high_confidence_count = len(extra_in_impl) - low_confidence_count + emit_sync_final( + output, args, + has_errors=has_errors, + has_warnings=has_warnings, + impl_sigs=impl_sigs, + spec_sigs=spec_sigs, + mismatches=mismatches, + missing_in_impl=missing_in_impl, + extra_in_impl=extra_in_impl, + low_confidence_count=low_confidence_count, + high_confidence_count=high_confidence_count, + ) if __name__ == '__main__': diff --git a/scripts/validate_go_spec_references.py b/scripts/validate_go_spec_references.py index 0958d4d7..2f052770 100644 --- a/scripts/validate_go_spec_references.py +++ b/scripts/validate_go_spec_references.py @@ -21,1004 +21,13 @@ """ import argparse -import re import sys from pathlib import Path -from typing import Dict, List, Optional, Set, Tuple -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" - -# Import shared utilities -if str(scripts_dir) not in sys.path: - sys.path.insert(0, str(scripts_dir)) - -from lib._validation_utils import ( # noqa: E402 - OutputBuilder, get_workspace_root, format_issue_message, parse_no_color_flag, - ValidationIssue, - is_safe_path, validate_spec_file_name, validate_anchor, - extract_headings_with_section_numbers, - FileContentCache, DOCS_DIR, TECH_SPECS_DIR, - extract_h2_plus_headings_with_sections +from lib._validation_utils import ( + OutputBuilder, get_workspace_root, parse_no_color_flag, ) - - -class SpecReference: - """Represents a specification reference from a Go file.""" - - def __init__(self, file_path: Path, line_num: int, raw_ref: str): - self.file_path = file_path - self.line_num = line_num - self.raw_ref = raw_ref - self.spec_file: Optional[str] = None - self.section: Optional[str] = None - self.heading: Optional[str] = None - self.is_valid_format = False - self.function_name: Optional[str] = None # Extracted from Go code context - self.suggested_ref: Optional[str] = None # Suggested correct reference - self._parse() - - def _parse(self): - """Parse the raw reference into components. - - Expected format: file_name.md: 4.5 Descriptive Heading - Also handles: file_name.md#anchor or file_name.md Section X - """ - # Remove relative path prefixes and validate no traversal remains - ref = self.raw_ref.strip() - # Remove any path traversal attempts - ref = re.sub(r'^\.\./', '', ref) - ref = re.sub(r'^\.\.\\', '', ref) - # Check for any remaining traversal attempts - if '..' in ref or '/' in ref or '\\' in ref: - # If there's any path separator or traversal, reject it - return - - # Check for the required format: file.md: section heading - # Pattern: filename.md: section_number [.] heading_text - # Optional period after section (e.g. "1. Core" or "1 Core") for markdown-style headings. - pattern = r'^([a-zA-Z0-9_\-]+\.md):\s+(\d+(?:\.\d+)*)\.?\s+(.+)$' - match = re.match(pattern, ref) - - if match: - self.is_valid_format = True - self.spec_file = match.group(1) - self.section = match.group(2) - self.heading = match.group(3).strip() - else: - # Try to parse anchor format: file.md#anchor - anchor_pattern = r'^([a-zA-Z0-9_\-]+\.md)#([^#\s]+)$' - anchor_match = re.match(anchor_pattern, ref) - if anchor_match: - self.spec_file = anchor_match.group(1) - anchor = anchor_match.group(2) - # Try to extract section number from anchor - # Examples: "21-addfile" -> "2.1", "116-getmetadata" -> "1.1.6", - # "74-header" -> "7.4", "3-tag-management" -> "3" - # Pattern: digits at start, optionally with dots - section_match = re.match(r'^(\d+)(?:-|$)', anchor) - if section_match: - digits_str = section_match.group(1) - # Convert "21" -> "2.1", "116" -> "1.1.6", "74" -> "7.4", "3" -> "3" - if len(digits_str) == 1: - self.section = digits_str - elif len(digits_str) == 2: - self.section = f"{digits_str[0]}.{digits_str[1]}" - elif len(digits_str) == 3: - self.section = f"{digits_str[0]}.{digits_str[1]}.{digits_str[2]}" - elif len(digits_str) == 4: - self.section = ( - f"{digits_str[0]}.{digits_str[1]}." - f"{digits_str[2]}.{digits_str[3]}" - ) - return - - # Try "Section X" format - section_pattern = r'^([a-zA-Z0-9_\-]+\.md)\s+Section\s+(\d+(?:\.\d+)*)(?:\s*-\s*(.+))?$' - section_match = re.match(section_pattern, ref) - if section_match: - self.spec_file = section_match.group(1) - self.section = section_match.group(2) - if section_match.group(3): - self.heading = section_match.group(3).strip() - return - - # Invalid format - try to extract what we can for error reporting - if ':' in ref: - parts = ref.split(':', 1) - self.spec_file = parts[0].strip() - elif ref.endswith('.md'): - self.spec_file = ref.strip() - else: - # Try to extract filename if it looks like a file reference - md_match = re.search(r'([a-zA-Z0-9_\-]+\.md)', ref) - if md_match: - self.spec_file = md_match.group(1) - - def __repr__(self): - if self.is_valid_format: - return f"{self.spec_file}: {self.section} {self.heading}" - return self.raw_ref - - -class SpecValidator: - """Validates specification references.""" - - def __init__(self, repo_root: Path): - self.repo_root = repo_root - self.docs_dir = repo_root / DOCS_DIR / TECH_SPECS_DIR - self.api_go_dir = repo_root / "api" / "go" - self.index_file = self.docs_dir / "api_go_defs_index.md" - - # File content cache to avoid repeated reads - self.file_cache = FileContentCache() - - # Cache of parsed spec files (file -> set of anchors) - self.spec_anchors: Dict[str, Set[str]] = {} - # file -> {section_num: (heading_text, anchor)} - self.spec_sections: Dict[ - str, Dict[str, Tuple[str, str]] - ] = {} - - # Load index file - self.index_entries: Dict[str, str] = {} # method/type -> spec_file - self.index_link_texts: Dict[str, str] = {} # method/type -> link text for context - self.index_anchors: Dict[str, str] = {} # method/type -> anchor (e.g., "11-hashtype-type") - self._load_index() - - def _is_section_0_or_cross_reference(self, section_num: str, heading_text: str = "") -> bool: - """Check if a section is section 0 or a cross-reference section (not source of truth).""" - # Section 0 or sections starting with "0." are not source of truth - if section_num == "0" or section_num.startswith("0."): - return True - # Check if heading contains cross-reference keywords - if heading_text: - heading_lower = heading_text.lower() - if "cross-reference" in heading_lower or "cross-references" in heading_lower: - return True - if "overview" in heading_lower and section_num.startswith("0."): - return True - return False - - def _clean_heading(self, section_num: str, heading_text: str) -> str: - """Remove section number from heading if present, handling edge cases.""" - # Special case: if section is "0" and heading starts with "0. ", return just the text after - if section_num == "0": - if heading_text.startswith("0. "): - return heading_text[3:] - elif heading_text.startswith("0 "): - return heading_text[2:] - - # Remove section number prefix (e.g., "2.1 AddFile" -> "AddFile") - # Match the exact section number at the start - section_pattern = re.escape(section_num) + r'(?:\.\s+|\s+)' - heading_clean = re.sub(r'^' + section_pattern, '', heading_text) - - # If that didn't work, try generic pattern - if heading_clean == heading_text: - heading_clean = re.sub(r'^\d+(?:\.\d+)*\s+', '', heading_text) - - return heading_clean - - def _format_section_number( - self, section_num: str, heading_text: str - ) -> str: - """Format section number with period if original heading had one. - - Args: - section_num: The section number (e.g., "11", "2.1", "8.1.6") - heading_text: The full heading text (e.g., "11. HashType Type") - - Returns: - Section number formatted with period if original had one (e.g., "11." or "2.1") - """ - # Check if the original heading has a period after the section number - if heading_text.startswith(section_num + '.'): - return section_num + '.' - return section_num - - def _load_index(self): - """Load api_go_defs_index.md and extract method/type -> spec_file mappings with anchors.""" - if not self.index_file.exists(): - warning_msg = format_issue_message( - "warning", - "Index file not found", - str(self.index_file), - None, - "skipping index validation", - None, - False - ) - print(warning_msg) - return - - # Verify index file is within repo - if not self._is_safe_path(self.index_file): - warning_msg = format_issue_message( - "warning", - "Index file path unsafe", - str(self.index_file), - None, - "skipping index validation", - None, - False - ) - print(warning_msg) - return - - content = self.file_cache.get_content(self.index_file) - - # Pattern: **`Package.AddFile`** - [File Management API - AddFile] - # (api_file_mgmt_addition.md#anchor) - pattern = r'\*\*`([^`]+)`\*\*\s*-\s*\[([^\]]+)\]\(([^)]+)\)' - for match in re.finditer(pattern, content): - method_type = match.group(1) - link_text = match.group(2) - # e.g., "api_file_mgmt_addition.md#21-addfile" or - # "api_file_mgmt_addition.md" - link_target = match.group(3) - - # Extract spec file and anchor with security validation - if '#' in link_target: - spec_file, anchor = link_target.split('#', 1) - # Validate anchor is safe - if not self._validate_anchor(anchor): - continue # Skip this entry if anchor is unsafe - self.index_anchors[method_type] = anchor - else: - spec_file = link_target - self.index_anchors[method_type] = None - - # Validate spec file name is safe - if not self._validate_spec_file_name(spec_file): - continue # Skip this entry if filename is unsafe - - self.index_entries[method_type] = spec_file - self.index_link_texts[method_type] = link_text - - def _parse_markdown_anchors( - self, file_path: Path - ) -> Tuple[Set[str], Dict[str, Tuple[str, str]]]: - """Parse markdown file to extract all heading anchors and section numbers. - - Returns: - Tuple of (anchors set, sections dict where key is section_num and - value is (heading_text, anchor)) - """ - # Use shared utility function with cache - return extract_headings_with_section_numbers( - file_path, min_level=2, max_level=6, file_cache=self.file_cache - ) - - def _is_safe_path(self, file_path: Path) -> bool: - """Check if a path is safe (within repo and no traversal).""" - return is_safe_path(file_path, self.repo_root) - - def _validate_spec_file_name(self, spec_file: str) -> bool: - """Validate that spec file name is safe (no path traversal, no separators).""" - return validate_spec_file_name(spec_file) - - def _validate_anchor(self, anchor: str) -> bool: - """Validate that anchor is safe (no path traversal, no separators).""" - return validate_anchor(anchor) - - def _get_spec_file_path(self, spec_file: str) -> Optional[Path]: - """Get the full path to a spec file with security validation.""" - # Validate file name is safe - if not self._validate_spec_file_name(spec_file): - return None - - # Construct path within docs directory - file_path = self.docs_dir / spec_file - - # Verify the resolved path is within repo - if not self._is_safe_path(file_path): - return None - - return file_path - - def _validate_reference(self, ref: SpecReference) -> List[ValidationIssue]: - """Validate a single reference. Returns list of ValidationIssue objects (empty if valid).""" - errors: List[ValidationIssue] = [] - - # Try to find correct reference from index if we have a function name - if ref.function_name: - correct_ref = self._find_correct_reference_from_index(ref.function_name) - if correct_ref: - spec_file, section_num, heading = correct_ref - # Filter out section 0 and cross-reference sections - if not self._is_section_0_or_cross_reference(section_num, heading): - # heading is already cleaned, section_num is already formatted - ref.suggested_ref = f"{spec_file}: {section_num} {heading}" - - # First check format - if not ref.is_valid_format: - # If we have a parsed section from anchor, try to validate it - if ref.spec_file and ref.section: - spec_path = self._get_spec_file_path(ref.spec_file) - if spec_path and spec_path.exists(): - # Load sections if not cached - if ref.spec_file not in self.spec_sections: - _, sections = self._parse_markdown_anchors(spec_path) - self.spec_sections[ref.spec_file] = sections - - sections = self.spec_sections[ref.spec_file] - if ref.section in sections: - # Found the section! But check if it's section 0 or cross-reference - heading_text, _ = sections[ref.section] - if self._is_section_0_or_cross_reference(ref.section, heading_text): - # Don't suggest section 0 or cross-reference sections - # Try to find a better section - similar = [ - s for s in sections.keys() - if not self._is_section_0_or_cross_reference( - s, sections[s][0] - ) and ( - ref.section.startswith(s) or - s.startswith(ref.section) - ) - ] - if similar: - actual_heading, _ = sections[similar[0]] - heading_clean = self._clean_heading(similar[0], actual_heading) - section_formatted = self._format_section_number( - similar[0], actual_heading - ) - ref.suggested_ref = ( - f"{ref.spec_file}: {section_formatted} {heading_clean}" - ) - else: - # Valid section - generate correct format - heading_clean = self._clean_heading(ref.section, heading_text) - section_formatted = self._format_section_number( - ref.section, heading_text - ) - ref.suggested_ref = ( - f"{ref.spec_file}: {section_formatted} {heading_clean}" - ) - else: - # Section not found, try to find similar (skip section 0 and cross-refs) - similar = [ - s for s in sections.keys() - if not self._is_section_0_or_cross_reference( - s, sections[s][0] - ) and ( - ref.section.startswith(s) or - s.startswith(ref.section) - ) - ] - if similar: - actual_heading, _ = sections[similar[0]] - heading_clean = self._clean_heading(similar[0], actual_heading) - section_formatted = self._format_section_number( - similar[0], actual_heading - ) - ref.suggested_ref = ( - f"{ref.spec_file}: {section_formatted} {heading_clean}" - ) - - # Build error message - message_parts = [] - if not ref.suggested_ref: - message_parts.append( - "Invalid format. Expected: 'file_name.md: " - "section_number heading_text'" - ) - message_parts.append(f"Got: '{ref.raw_ref}'") - message_parts.append( - "Example: 'api_file_mgmt_addition.md: " - "2.1 AddFile Package Method'" - ) - else: - message_parts.append(f"Invalid reference: '{ref.raw_ref}'") - - # Check if spec file exists - if ref.spec_file: - spec_path = self._get_spec_file_path(ref.spec_file) - if not spec_path or not spec_path.exists(): - message_parts.append(f"Spec file not found: {ref.spec_file}") - - if message_parts: - errors.append(ValidationIssue( - "invalid_spec_ref_format", - ref.file_path, - ref.line_num, - ref.line_num, - " ".join(message_parts), - severity='error', - suggestion=ref.suggested_ref, - raw_ref=ref.raw_ref, - spec_file=ref.spec_file - )) - return errors - - if not ref.spec_file: - errors.append(ValidationIssue( - "missing_spec_file", - ref.file_path, - ref.line_num, - ref.line_num, - "No spec file specified in reference", - severity='error', - raw_ref=ref.raw_ref - )) - return errors - - spec_path = self._get_spec_file_path(ref.spec_file) - if not spec_path or not spec_path.exists(): - errors.append(ValidationIssue( - "spec_file_not_found", - ref.file_path, - ref.line_num, - ref.line_num, - f"Spec file not found: {ref.spec_file}", - severity='error', - raw_ref=ref.raw_ref, - spec_file=ref.spec_file - )) - return errors - - # Load anchors and sections for this file if not cached - if ref.spec_file not in self.spec_anchors: - anchors, sections = self._parse_markdown_anchors(spec_path) - self.spec_anchors[ref.spec_file] = anchors - self.spec_sections[ref.spec_file] = sections - - # Validate section exists - sections = self.spec_sections[ref.spec_file] - if ref.section not in sections: - # Check for similar section numbers (skip section 0 and cross-refs) - similar = [ - s for s in sections.keys() - if not self._is_section_0_or_cross_reference(s, sections[s][0]) and ( - ref.section.startswith(s) or s.startswith(ref.section) - ) - ] - if similar: - actual_heading, _ = sections[similar[0]] - # Remove section number from heading if present - heading_clean = self._clean_heading(similar[0], actual_heading) - if not ref.suggested_ref: - ref.suggested_ref = ( - f"{ref.spec_file}: {similar[0]} {heading_clean}" - ) - message = ( - f"Section '{ref.section}' not found. " - f"Did you mean: '{similar[0]} {actual_heading}'?" - ) - else: - message = f"Section '{ref.section}' not found in {ref.spec_file}" - # Show available sections (excluding section 0 and cross-refs) - available_sections = [] - if sections: - available = [ - (num, heading) for num, (heading, _) in sorted(sections.items()) - if not self._is_section_0_or_cross_reference(num, heading) - ][:5] - available_sections = [f"{num} {heading}" for num, heading in available] - if available_sections: - message += f" Available sections: {', '.join(available_sections)}..." - - errors.append(ValidationIssue( - "section_not_found", - ref.file_path, - ref.line_num, - ref.line_num, - message, - severity='error', - suggestion=ref.suggested_ref, - raw_ref=ref.raw_ref, - spec_file=ref.spec_file, - section=ref.section - )) - return errors - - # Validate heading matches (or is close to) the actual heading - actual_heading, actual_anchor = sections[ref.section] - - # Normalize both headings for comparison (lowercase, remove extra spaces) - normalized_ref = re.sub(r'\s+', ' ', ref.heading.lower().strip()) - normalized_actual = re.sub(r'\s+', ' ', actual_heading.lower().strip()) - - # Check if they match (allowing for some flexibility) - if normalized_ref != normalized_actual: - # Check if the reference heading is a substring of the actual heading - # (e.g., "AddFile" matches "2.1 AddFile Package Method") - if normalized_ref not in normalized_actual: - # Check if actual heading contains the reference heading (reverse) - if normalized_actual not in normalized_ref: - # Remove section number from heading if present - heading_clean = self._clean_heading(ref.section, actual_heading) - if not ref.suggested_ref: - ref.suggested_ref = f"{ref.spec_file}: {ref.section} {heading_clean}" - errors.append( - f" Heading mismatch for section {ref.section}" - ) - errors.append(f" Expected: '{actual_heading}'") - errors.append(f" Got: '{ref.heading}'") - errors.append( - f" Correct format: '{ref.spec_file}: {ref.section} {actual_heading}'" - ) - - return errors - - def _extract_function_or_type_name(self, go_file: Path, line_num: int) -> Optional[str]: - """Extract function or type name from Go code context around the specification comment.""" - try: - lines = self.file_cache.get_lines(go_file) - - # Look backwards from the comment line to find function/type definition - # Check up to 30 lines before the comment - start_line = max(0, line_num - 30) - context = ''.join(lines[start_line:line_num]) - - # Pattern 1: func (receiver) MethodName(...) - most common - # Try to get receiver type too for better matching - match = re.search(r'func\s+\([^)]*(\w+)[^)]*\)\s+([A-Z][a-zA-Z0-9_]+)\s*\(', context) - if match: - receiver = match.group(1) - method = match.group(2) - # Try full qualified name first (e.g., "Package.AddFile") - if receiver and receiver[0].isupper(): - return f"{receiver}.{method}" - return method - - # Pattern 2: func FunctionName(...) - match = re.search(r'func\s+([A-Z][a-zA-Z0-9_]+)\s*\(', context) - if match: - return match.group(1) - - # Pattern 3: type TypeName - match = re.search(r'type\s+([A-Z][a-zA-Z0-9_]+)\s+(?:struct|interface|\[|$)', context) - if match: - return match.group(1) - - # Pattern 4: const ConstName - match = re.search(r'const\s+([A-Z][a-zA-Z0-9_]+)\s*=', context) - if match: - return match.group(1) - - # Pattern 5: var VarName - match = re.search(r'var\s+([A-Z][a-zA-Z0-9_]+)\s*=', context) - if match: - return match.group(1) - - return None - except (IOError, OSError, UnicodeDecodeError): - # File read errors - return None silently as this is a helper function - return None - except Exception as e: - # Unexpected errors - log but don't fail - if self.verbose: - print(f"Warning: Unexpected error extracting function name: {e}", file=sys.stderr) - return None - - def _parse_anchor_to_section_and_heading( - self, anchor: str, spec_file: str - ) -> Optional[Tuple[str, str]]: - """Parse an anchor back to section number and heading by looking it up in the spec file. - - Since anchors are generated by removing dots from section numbers, we can't reliably - reverse them (e.g., "11" could be section "11" or "1.1"). Instead, we look up the - actual heading in the spec file that matches this anchor. - - Args: - anchor: The anchor string (e.g., "11-hashtype-type") - spec_file: The spec file name (e.g., "api_file_mgmt_file_entry.md") - - Returns: - Tuple of (section_num, heading_text) or None if not found - Never returns section 0 or cross-reference sections. - """ - if not anchor: - return None - - # Validate anchor is safe - if not self._validate_anchor(anchor): - return None - - # Validate spec file name is safe - if not self._validate_spec_file_name(spec_file): - return None - - spec_path = self._get_spec_file_path(spec_file) - if not spec_path or not spec_path.exists(): - return None - - # Double-check path is safe after resolution - if not self._is_safe_path(spec_path): - return None - - # Load sections if not cached - if spec_file not in self.spec_sections: - _, sections = self._parse_markdown_anchors(spec_path) - self.spec_sections[spec_file] = sections - - sections = self.spec_sections[spec_file] - - # Find the section that has this anchor, but skip section 0 and cross-references - for section_num, (heading_text, heading_anchor) in sections.items(): - if heading_anchor == anchor: - # Filter out section 0 and cross-reference sections - if not self._is_section_0_or_cross_reference(section_num, heading_text): - return (section_num, heading_text) - - # If not found in cached sections, try generating the anchor from all headings - # and match it using shared utility function - try: - # Use shared utility to extract headings with section anchors - headings = extract_h2_plus_headings_with_sections( - spec_path, file_cache=self.file_cache - ) - - for heading_level, heading_text, line_num, plain_anchor, section_anchor in headings: - # Check if the section_anchor matches (this is the format: "123-heading-text") - if section_anchor == anchor: - # Extract section number from heading text - section_match = re.match(r'^(\d+(?:\.\d+)*)', heading_text) - if section_match: - section_num = section_match.group(1) - # Filter out section 0 and cross-reference sections - if not self._is_section_0_or_cross_reference(section_num, heading_text): - return (section_num, heading_text) - - # Also check plain anchor as fallback - if plain_anchor == anchor: - # Extract section number from heading text - section_match = re.match(r'^(\d+(?:\.\d+)*)', heading_text) - if section_match: - section_num = section_match.group(1) - # Filter out section 0 and cross-reference sections - if not self._is_section_0_or_cross_reference(section_num, heading_text): - return (section_num, heading_text) - except (IOError, OSError, UnicodeDecodeError): - # File read errors - return None silently - pass - except Exception as e: - # Unexpected errors - log but don't fail - if self.verbose: - print(f"Warning: Unexpected error parsing anchor: {e}", file=sys.stderr) - pass - - return None - - def _try_exact_match(self, function_name: str) -> Optional[Tuple[str, str, str]]: - """Try to find an exact match for the function name in the index.""" - if function_name not in self.index_entries: - return None - - spec_file = self.index_entries[function_name] - anchor = self.index_anchors.get(function_name) - if anchor: - parsed = self._parse_anchor_to_section_and_heading(anchor, spec_file) - if parsed: - section_num, heading = parsed - # Filter out section 0 and cross-reference sections - if not self._is_section_0_or_cross_reference(section_num, heading): - heading_clean = self._clean_heading(section_num, heading) - section_formatted = self._format_section_number(section_num, heading) - return (spec_file, section_formatted, heading_clean) - - # Fallback to old method if no anchor - fallback = self._find_section_for_spec_file(spec_file, function_name) - if fallback: - spec_file_fb, section_num, heading_text = fallback - # Filter out section 0 and cross-reference sections - if not self._is_section_0_or_cross_reference(section_num, heading_text): - heading_clean = self._clean_heading(section_num, heading_text) - section_formatted = self._format_section_number(section_num, heading_text) - return (spec_file_fb, section_formatted, heading_clean) - return None - - def _calculate_match_score(self, function_base: str, index_name: str) -> int: - """Calculate match score for a function name against an index entry. - - Returns: - Score: 100 = exact base match, 50 = receiver match, 0 = no match - """ - index_base = index_name.split('.')[-1] if '.' in index_name else index_name - - if function_base == index_base: - # Exact base name match (e.g., "AddFile" == "AddFile") - return 100 - elif index_name.endswith('.' + function_base): - # Receiver match (e.g., "Package.AddFile" ends with ".AddFile") - return 50 - return 0 - - def _create_candidate_from_anchor( - self, anchor: str, spec_file: str, score: int - ) -> Optional[Tuple[int, str, str, str]]: - """Create a candidate tuple from an anchor match.""" - parsed = self._parse_anchor_to_section_and_heading(anchor, spec_file) - if not parsed: - return None - - section_num, heading = parsed - # Filter out section 0 and cross-reference sections - if self._is_section_0_or_cross_reference(section_num, heading): - return None - - heading_clean = self._clean_heading(section_num, heading) - section_formatted = self._format_section_number(section_num, heading) - return (score, spec_file, section_formatted, heading_clean) - - def _create_candidate_from_fallback( - self, fallback: Tuple[str, str, str], score: int - ) -> Optional[Tuple[int, str, str, str]]: - """Create a candidate tuple from a fallback match.""" - spec_file_fb, section_num, heading_text = fallback - # Filter out section 0 and cross-reference sections - if self._is_section_0_or_cross_reference(section_num, heading_text): - return None - - heading_clean = self._clean_heading(section_num, heading_text) - section_formatted = self._format_section_number(section_num, heading_text) - return (score, spec_file_fb, section_formatted, heading_clean) - - def _find_partial_matches( - self, function_base: str - ) -> List[Tuple[int, str, str, str]]: - """Find partial matches for a function name in the index. - - Returns: - List of candidate tuples: (score, spec_file, section_formatted, heading_clean) - """ - candidates = [] - - for index_name, spec_file in self.index_entries.items(): - score = self._calculate_match_score(function_base, index_name) - if score == 0: - continue - - anchor = self.index_anchors.get(index_name) - if anchor: - candidate = self._create_candidate_from_anchor(anchor, spec_file, score) - if candidate: - candidates.append(candidate) - else: - # Fallback to old method if no anchor - fallback = self._find_section_for_spec_file(spec_file, index_name) - if fallback: - candidate = self._create_candidate_from_fallback(fallback, score) - if candidate: - candidates.append(candidate) - - return candidates - - def _find_correct_reference_from_index( - self, function_name: str - ) -> Optional[Tuple[str, str, str]]: - """Find correct reference from index. Returns (spec_file, section_num, heading) or None. - - Now uses anchors from the index file directly, assuming they are correct. - """ - if not function_name: - return None - - # Try exact match first - exact_match = self._try_exact_match(function_name) - if exact_match: - return exact_match - - # Try partial matches (e.g., "AddFile" matches "Package.AddFile") - # Extract base name (last component after dot) - function_base = function_name.split('.')[-1] if '.' in function_name else function_name - - candidates = self._find_partial_matches(function_base) - - # Return the best match (highest score, first in case of ties) - if candidates: - # Sort by score descending, then return first - candidates.sort(key=lambda x: x[0], reverse=True) - _, spec_file, section_formatted, heading_clean = candidates[0] - return (spec_file, section_formatted, heading_clean) - - return None - - def _find_section_for_spec_file( - self, spec_file: str, context: str - ) -> Optional[Tuple[str, str, str]]: - """Find the section number and heading for a spec file based on context.""" - spec_path = self._get_spec_file_path(spec_file) - if not spec_path or not spec_path.exists(): - return None - - # Load sections if not cached - if spec_file not in self.spec_sections: - _, sections = self._parse_markdown_anchors(spec_path) - self.spec_sections[spec_file] = sections - - sections = self.spec_sections[spec_file] - - # Extract function name from context (e.g., "Package.AddFile" -> "AddFile") - function_name = context.split('.')[-1] if '.' in context else context - - # Try to find a section that matches the function/type name - # Look for sections that contain the function name or keywords from it - context_lower = context.lower() - function_name_lower = function_name.lower() - context_words = set(re.findall(r'[a-zA-Z]+', context_lower)) - - best_match = None - best_score = 0 - - for section_num, (heading_text, _) in sections.items(): - # Skip section 0 and cross-reference sections - they are not source of truth - if self._is_section_0_or_cross_reference(section_num, heading_text): - continue - - heading_lower = heading_text.lower() - heading_words = set(re.findall(r'[a-zA-Z]+', heading_lower)) - - # Score based on: - # 1. Exact function name match in heading (highest priority) - # 2. Word overlap - score = 0 - if function_name_lower in heading_lower: - score += 100 # Strong match - if function_name in heading_text: - score += 50 # Case-sensitive match - - # Word overlap score - overlap = len(context_words & heading_words) - score += overlap * 10 - - if score > best_score: - best_score = score - best_match = (spec_file, section_num, heading_text) - - # If we found a good match (not section 0), return it - if best_match and best_score >= 10: # Require at least some match - return best_match - - # Otherwise, try to find a section with the function name in the content - # by reading the spec file and searching for the function name - try: - lines = self.file_cache.get_lines(spec_path) - - # Search for function name in headings or near headings - for i, line in enumerate(lines): - if re.match(r'^#{2,6}\s+', line): # This is a heading - # Check if function name appears in nearby content - nearby = ' '.join(lines[max(0, i):min(len(lines), i + 10)]) - if function_name in nearby or function_name_lower in nearby.lower(): - # Extract section number from this heading - heading_match = re.match(r'^#{2,6}\s+(.+)', line) - if heading_match: - heading_text = heading_match.group(1).strip() - section_match = re.match(r'^(\d+(?:\.\d+)*)', heading_text) - if section_match: - section_num = section_match.group(1) - # Filter out section 0 and cross-reference sections - if not self._is_section_0_or_cross_reference( - section_num, heading_text): - return (spec_file, section_num, heading_text) - except (IOError, OSError, UnicodeDecodeError): - # File read errors - continue to fallback logic - pass - except Exception as e: - # Unexpected errors - log but don't fail - if self.verbose: - print(f"Warning: Unexpected error finding section: {e}", file=sys.stderr) - pass - - # Try to find first non-zero, non-cross-reference section - if sections: - for section_num, (heading_text, _) in sorted(sections.items()): - if not self._is_section_0_or_cross_reference(section_num, heading_text): - return (spec_file, section_num, heading_text) - - # Never return section 0 or cross-reference sections - return None - - def find_spec_references(self, go_file: Path) -> List[SpecReference]: - """Find all specification references in a Go file.""" - references = [] - - try: - lines = self.file_cache.get_lines(go_file) - for line_num, line in enumerate(lines, 1): - # Match "// Specification: ..." comments - match = re.search(r'//\s*Specification:\s*(.+)', line) - if match: - ref_text = match.group(1).strip() - # Handle multi-line references (some may have continuation) - if ref_text: - ref = SpecReference(go_file, line_num, ref_text) - # Try to extract function/type name for index lookup - ref.function_name = ( - self._extract_function_or_type_name( - go_file, line_num - ) - ) - references.append(ref) - except (IOError, OSError) as e: - # File read errors - create ValidationIssue - error = ValidationIssue( - "file_read_error", - go_file, - 0, - 0, - f"Could not read file: {e}", - severity='error' - ) - self.issues.append(error) - except UnicodeDecodeError as e: - # Encoding errors - create ValidationIssue - error = ValidationIssue( - "file_encoding_error", - go_file, - 0, - 0, - f"Could not decode file (encoding issue): {e}", - severity='error' - ) - self.issues.append(error) - except Exception as e: - # Unexpected errors - create ValidationIssue - error = ValidationIssue( - "unexpected_error", - go_file, - 0, - 0, - f"Unexpected error reading file: {e}", - severity='error' - ) - self.issues.append(error) - - return references - - def validate_all( - self, check_index: bool = False, verbose: bool = False, output=None - ) -> Tuple[int, List[str]]: - """Validate all references in Go files. Returns (error_count, error_messages).""" - error_count = 0 - error_messages: List[str] = [] - all_issues: List[ValidationIssue] = [] - - # Find all Go files - go_files = list(self.api_go_dir.rglob("*.go")) - - if output: - output.add_verbose_line( - f"Scanning {len(go_files)} Go files for specification references..." - ) - else: - print(f"Scanning {len(go_files)} Go files for specification references...") - - for go_file in go_files: - references = self.find_spec_references(go_file) - if not references: - continue - - if verbose: - rel_path = go_file.relative_to(self.repo_root) - if output: - output.add_verbose_line( - f" Checking {rel_path} ({len(references)} reference(s))" - ) - else: - print(f" Checking {rel_path} ({len(references)} reference(s))") - - for ref in references: - if verbose: - if output: - output.add_verbose_line(f" Validating: {ref.raw_ref}") - else: - print(f" Validating: {ref.raw_ref}") - errors = self._validate_reference(ref) - if errors: - error_count += len(errors) - all_issues.extend(errors) - # Format error messages for output - for error in errors: - if isinstance(error, ValidationIssue): - error_messages.append(error.format_message(no_color=False)) - else: - error_messages.append(str(error)) - - return error_count, error_messages +from lib._validate_go_spec_references_validator import SpecValidator def main(): @@ -1066,7 +75,6 @@ def main(): args = parser.parse_args() - # Create output builder (header streams immediately if verbose) no_color = args.nocolor or parse_no_color_flag(sys.argv) output = OutputBuilder( "Go Specification References Validation", @@ -1086,7 +94,7 @@ def main(): output.add_blank_line("working_verbose") error_count, error_messages = validator.validate_all( - check_index=args.check_index, verbose=args.verbose, output=output + _check_index=args.check_index, verbose=args.verbose, output=output ) if error_messages: @@ -1098,7 +106,6 @@ def main(): f"Found {error_count} error(s)", section="error" ) - # Check if any messages contain suggestions (-> format) has_suggestions = any(" -> " in msg for msg in error_messages) if has_suggestions: output.add_blank_line("error") @@ -1108,12 +115,15 @@ def main(): section="error" ) output.add_failure_message("Validation failed. Please fix the errors above.") - output.print() - return output.get_exit_code(args.no_fail) + elif output.has_warnings(): + output.add_warnings_only_message( + verbose_hint="Run with --verbose to see the full warning details.", + ) + else: + output.add_success_message("All specification references are valid!") - output.add_success_message("All specification references are valid!") output.print() - return 0 + return output.get_exit_code(args.no_fail) if __name__ == "__main__": diff --git a/scripts/validate_go_spec_signature_consistency.py b/scripts/validate_go_spec_signature_consistency.py index 2dce9109..55a25463 100644 --- a/scripts/validate_go_spec_signature_consistency.py +++ b/scripts/validate_go_spec_signature_consistency.py @@ -45,108 +45,29 @@ import sys from collections import defaultdict from pathlib import Path -from typing import Dict, List, Optional, Tuple +from typing import Callable, Dict, List, Optional, Set, Tuple -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" - -# Import shared utilities -for module_path in (str(scripts_dir), str(lib_dir)): - if module_path not in sys.path: - sys.path.insert(0, module_path) - -from lib._validation_utils import ( # noqa: E402 - OutputBuilder, parse_no_color_flag, - find_markdown_files, parse_paths, get_workspace_root, +from lib._validation_utils import ( + OutputBuilder, find_markdown_files, parse_paths, get_workspace_root, get_validation_exit_code, HeadingContext, find_heading_before_line, ValidationIssue, DOCS_DIR, TECH_SPECS_DIR ) -from lib._go_code_utils import ( # noqa: E402 - parse_go_def_signature, - is_example_code, is_example_signature_name, - find_go_code_blocks, Signature, is_public_name, +from lib._go_code_utils import ( + is_example_signature_name, + find_go_code_blocks, Signature, extract_interfaces_from_markdown, - InterfaceParser, normalize_go_signature + normalize_go_signature +) +from lib._validate_go_spec_signature_consistency_helpers import ( + extract_signatures_from_block as _extract_signatures_from_block, + parse_cli_args, + build_output, + split_issues, + emit_issues, + emit_summary, + emit_final_message, ) - -# Compiled regex patterns for performance (module level) -_RE_INTERFACE_PATTERN = re.compile(r'^\s*type\s+\w+(?:\s*\[[^\]]+\])?\s+interface\s*\{') -_RE_STRUCT_PATTERN = re.compile(r'^\s*type\s+(\w+)(?:\s*(\[[^\]]+\]))?\s+struct\s*\{') - - -def count_interface_methods(content: str, start_line: int, end_line: int) -> int: - """ - Count methods in an interface definition within a specific line range. - - This is a specialized lightweight utility that uses the shared InterfaceParser - class for brace depth tracking. It's simpler than the full extraction helpers - since it only needs to count methods, not extract full signatures. - - Note: This function is currently not called but kept for potential future use. - If needed, it could be refactored to use extract_interfaces_from_markdown(), - but the current implementation is acceptable as it uses shared utilities. - """ - lines = content.split('\n') - method_count = 0 - interface_parser = InterfaceParser() - - for i in range(start_line - 1, min(end_line, len(lines))): - line = lines[i] - - # Check for interface start using InterfaceParser - interface_name = interface_parser.check_interface_start(line) - if interface_name: - continue - - if interface_parser.is_in_interface(): - still_in_interface = interface_parser.update_brace_depth(line) - # Check for method signature if still in interface body - if still_in_interface and interface_parser.brace_depth > 0: - sig = parse_go_def_signature(line, location="") - if sig and sig.kind in ('func', 'method'): - method_count += 1 - - if not still_in_interface: - break - - return method_count - - -def count_struct_fields(content: str, start_line: int, end_line: int) -> int: - """Count fields in a struct definition.""" - lines = content.split('\n') - field_count = 0 - in_struct = False - brace_depth = 0 - - for i in range(start_line - 1, min(end_line, len(lines))): - line = lines[i] - stripped = line.strip() - - # Check for struct start - if re.match(r'^\s*type\s+\w+\s+struct\s*\{', line): - in_struct = True - brace_depth = stripped.count('{') - stripped.count('}') - continue - - if in_struct: - brace_depth += stripped.count('{') - stripped.count('}') - # Check for field (not a comment, not empty, not a method) - if brace_depth > 0 and stripped and not stripped.startswith('//'): - # Simple heuristic: if it looks like a field (has identifier and type) - if re.match(r'^\s*\w+\s+\w+', stripped): - # Make sure it's not a method - if not re.match(r'^\s*func\s+', stripped): - field_count += 1 - - if brace_depth <= 0: - break - - return field_count - - -# is_example_signature_name now imported from _go_code_utils def extract_signatures_from_markdown_file(file_path: Path, repo_root: Path) -> List[Signature]: @@ -154,122 +75,23 @@ def extract_signatures_from_markdown_file(file_path: Path, repo_root: Path) -> L signatures = [] try: - # Get relative path from repo root try: relative_path = file_path.resolve().relative_to(repo_root.resolve()) except ValueError: - # If path is not under repo_root, use absolute path as fallback relative_path = file_path.resolve() content = file_path.read_text(encoding='utf-8') lines = content.split('\n') - - # Use shared helper to extract interfaces and their methods interface_signatures = extract_interfaces_from_markdown( content, file_path, start_line=1, parse_methods=True, skip_examples=True, lines=lines ) signatures.extend(interface_signatures) - - # Extract other signatures (structs, functions, methods, types) that aren't interfaces go_blocks = find_go_code_blocks(content) - - for start_line, end_line, code_content in go_blocks: + for start_line, _end_line, code_content in go_blocks: block_lines = code_content.split('\n') - - for i, line in enumerate(block_lines): - line_num = start_line + i - stripped = line.strip() - - # Skip empty lines and comments - if not stripped or stripped.startswith('//'): - continue - - # Skip interface definitions (already handled by helper) - if _RE_INTERFACE_PATTERN.match(stripped): - continue - - # Check if this is an example signature using the library function - is_example = is_example_code( - code_content, start_line, - lines=lines, - check_single_line=i - ) - - # Check for struct start - struct_match = _RE_STRUCT_PATTERN.match(stripped) - if struct_match: - # Skip if this is an example - if is_example: - continue - - name = struct_match.group(1) - generic_params = struct_match.group(2) # e.g., "[T any]" - is_public = is_public_name(name) if name else False - brace_depth = stripped.count('{') - stripped.count('}') - has_full_body = brace_depth > 0 - - # Count fields if it's a full struct - field_count = 0 - if has_full_body: - # Count fields within this code block - temp_brace_depth = brace_depth - for j in range(i + 1, len(block_lines)): - temp_line = block_lines[j] - temp_stripped = temp_line.strip() - if not temp_stripped or temp_stripped.startswith('//'): - continue - temp_brace_depth += temp_stripped.count('{') - temp_stripped.count('}') - if temp_brace_depth > 0 and temp_stripped: - # Simple heuristic: if it looks like a field - # (has identifier and type) - if re.match(r'^\s*\w+\s+\w+', temp_stripped): - # Make sure it's not a method - if not re.match(r'^\s*func\s+', temp_stripped): - field_count += 1 - if temp_brace_depth <= 0: - break - - signatures.append(Signature( - name=name, - kind='type', - location=f"{relative_path}:{line_num}", - is_public=is_public, - has_body=has_full_body, - field_count=field_count, - generic_params=generic_params - )) - continue - - # Check for any Go definition (function, method, or type) - sig = parse_go_def_signature(line, location=f"{relative_path}:{line_num}") - if sig: - if sig.kind in ('func', 'method'): - # Standalone method/function definitions are full - signatures.append(Signature( - name=sig.name, - kind=sig.kind, - receiver=sig.receiver, - params=sig.params, - returns=sig.returns, - location=f"{relative_path}:{line_num}", - is_public=sig.is_public, - has_body=True - )) - else: - # Type definition (not interface/struct, already handled) - # Skip if this is an example - if is_example: - continue - - if sig.kind != 'interface': # Interfaces already handled - signatures.append(Signature( - name=sig.name, - kind=sig.kind, - location=f"{relative_path}:{line_num}", - is_public=sig.is_public, - has_body=False, # Type aliases don't have bodies - generic_params=sig.generic_params - )) + signatures.extend(_extract_signatures_from_block( + block_lines, start_line, relative_path, code_content, lines + )) except (IOError, OSError) as e: # File read errors - log to stderr @@ -277,7 +99,7 @@ def extract_signatures_from_markdown_file(file_path: Path, repo_root: Path) -> L except UnicodeDecodeError as e: # Encoding errors - log to stderr print(f"Warning: Could not decode {file_path} (encoding issue): {e}", file=sys.stderr) - except Exception as e: + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: # Unexpected errors - log to stderr print(f"Warning: Unexpected error reading {file_path}: {e}", file=sys.stderr) @@ -308,10 +130,9 @@ def extract_heading_context(file_path: Path, line_num: int) -> Optional[HeadingC except (ValueError, IndexError, KeyError): # Data structure errors - return None silently pass - except Exception as e: + except (TypeError, AttributeError, RuntimeError) as e: # Unexpected errors - log but don't fail print(f"Warning: Unexpected error reading {file_path}: {e}", file=sys.stderr) - pass return None @@ -330,6 +151,89 @@ def get_signature_heading_context(sig: Signature, repo_root: Path) -> Optional[H return None +def _canonical_score_heading(heading_ctx: Optional[HeadingContext]) -> Tuple[float, List[str]]: + """Return (score_delta, reasons) for heading level.""" + if not heading_ctx: + return (0.0, []) + heading_score = max(0.1, 1.0 - (heading_ctx.heading_level - 1) * 0.2) + return (heading_score * 0.4, [f"heading level {heading_ctx.heading_level}"]) + + +def _canonical_score_body(sig: Signature) -> Tuple[float, List[str]]: + """Return (score_delta, reasons) for has_body.""" + if sig.has_body: + return (0.2, ["has body"]) + return (0.0, []) + + +def _canonical_score_first_occurrence( + sig: Signature, all_sigs: List[Signature] +) -> Tuple[float, List[str]]: + """Return (score_delta, reasons) for first occurrence in file.""" + file_path = sig.location.split(':', 1)[0] + same_file_sigs = [s for s in all_sigs if s.location.startswith(file_path)] + line_nums = [] + for s in same_file_sigs: + try: + line_nums.append(int(s.location.split(':', 1)[1])) + except (ValueError, IndexError): + pass + if not line_nums: + return (0.0, []) + try: + sig_line = int(sig.location.split(':', 1)[1]) + if sig_line == min(line_nums): + return (0.15, ["first occurrence in file"]) + except (ValueError, IndexError): + pass + return (0.0, []) + + +def _canonical_score_file_name(file_path: str) -> Tuple[float, List[str]]: + """Return (score_delta, reasons) for file name (core/basic vs advanced).""" + file_name = file_path.lower() + if 'core' in file_name or 'basic' in file_name: + return (0.1, ["core/basic file"]) + if 'advanced' in file_name or 'extended' in file_name: + return (-0.05, ["advanced/extended file"]) + return (0.0, []) + + +def _canonical_score_heading_keywords( + sig: Signature, heading_ctx: Optional[HeadingContext] +) -> Tuple[float, List[str]]: + """Return (score_delta, reasons) for heading keywords and name match.""" + if not heading_ctx: + return (0.0, []) + heading_lower = heading_ctx.heading_text.lower() + sig_name_lower = sig.name.lower() + score = 0.0 + reasons = [] + if sig_name_lower in heading_lower: + score += 0.15 + reasons.append("signature name in heading") + if 'definition' in heading_lower or 'definitions' in heading_lower: + score += 0.1 + reasons.append("definition keyword in heading") + if sig.kind in ('type', 'interface'): + if any(kw in heading_lower for kw in ['type', 'struct', 'types', 'interfaces']): + score += 0.1 + reasons.append("type-related keyword in heading") + elif sig.kind == 'func': + func_kw = ['function', 'functions', 'func', 'operation', 'operations'] + if any(kw in heading_lower for kw in func_kw): + score += 0.1 + reasons.append("function-related keyword in heading") + elif sig.kind == 'method' and sig.receiver: + if any(kw in heading_lower for kw in ['method', 'methods']): + score += 0.1 + reasons.append("method-related keyword in heading") + if sig.receiver.lower() in heading_lower: + score += 0.1 + reasons.append("receiver type in heading") + return (score, reasons) + + def score_canonical_signature( sig: Signature, heading_ctx: Optional[HeadingContext], @@ -343,90 +247,22 @@ def score_canonical_signature( Score ranges from 0.0 to 1.0. """ score = 0.0 - reasons = [] - - # Prefer less deeply nested headings (lower level = higher score) - if heading_ctx: - # H1 = 1.0, H2 = 0.8, H3 = 0.6, H4 = 0.4, H5 = 0.2, H6 = 0.1 - heading_score = max(0.1, 1.0 - (heading_ctx.heading_level - 1) * 0.2) - score += heading_score * 0.4 # 40% weight - reasons.append(f"heading level {heading_ctx.heading_level}") - - # Prefer signatures with bodies (more complete definitions) - if sig.has_body: - score += 0.2 # 20% weight - reasons.append("has body") - - # Prefer earlier line numbers (first occurrence) - file_path = sig.location.split(':', 1)[0] - same_file_sigs = [s for s in all_sigs if s.location.startswith(file_path)] - if same_file_sigs: - line_nums = [] - for s in same_file_sigs: - try: - line_nums.append(int(s.location.split(':', 1)[1])) - except (ValueError, IndexError): - pass - if line_nums: - try: - sig_line = int(sig.location.split(':', 1)[1]) - if sig_line == min(line_nums): - score += 0.15 # 15% weight - reasons.append("first occurrence in file") - except (ValueError, IndexError): - pass - - # Prefer files with more general names (e.g., api_core.md over api_core_advanced.md) - file_name = file_path.lower() - if 'core' in file_name or 'basic' in file_name: - score += 0.1 # 10% weight - reasons.append("core/basic file") - elif 'advanced' in file_name or 'extended' in file_name: - score -= 0.05 # Penalty - reasons.append("advanced/extended file") - - # Prefer signatures in sections with relevant keywords and signature name - if heading_ctx: - heading_lower = heading_ctx.heading_text.lower() - sig_name_lower = sig.name.lower() - - # Check for signature name in heading - if sig_name_lower in heading_lower: - score += 0.15 # 15% weight - reasons.append("signature name in heading") - - # "definition" or "definitions" applies to all signature types - if 'definition' in heading_lower or 'definitions' in heading_lower: - score += 0.1 # 10% weight - reasons.append("definition keyword in heading") - - # For type/interface/struct definitions: look for type-related keywords - if sig.kind in ('type', 'interface'): - type_keywords = ['type', 'struct', 'types', 'interfaces'] - if any(keyword in heading_lower for keyword in type_keywords): - score += 0.1 # 10% weight - reasons.append("type-related keyword in heading") - - # For functions: look for function-related keywords - elif sig.kind == 'func': - func_keywords = ['function', 'functions', 'func', 'operation', 'operations'] - if any(keyword in heading_lower for keyword in func_keywords): - score += 0.1 # 10% weight - reasons.append("function-related keyword in heading") - - # For methods: look for method-related keywords and receiver type - elif sig.kind == 'method' and sig.receiver: - method_keywords = ['method', 'methods'] - if any(keyword in heading_lower for keyword in method_keywords): - score += 0.1 # 10% weight - reasons.append("method-related keyword in heading") - - # Also check if receiver type name is in heading - receiver_lower = sig.receiver.lower() - if receiver_lower in heading_lower: - score += 0.1 # 10% weight - reasons.append("receiver type in heading") - + reasons: List[str] = [] + s, r = _canonical_score_heading(heading_ctx) + score += s + reasons.extend(r) + s, r = _canonical_score_body(sig) + score += s + reasons.extend(r) + s, r = _canonical_score_first_occurrence(sig, all_sigs) + score += s + reasons.extend(r) + s, r = _canonical_score_file_name(sig.location.split(':', 1)[0]) + score += s + reasons.extend(r) + s, r = _canonical_score_heading_keywords(sig, heading_ctx) + score += s + reasons.extend(r) return (min(1.0, max(0.0, score)), ", ".join(reasons) if reasons else "no specific indicators") @@ -517,10 +353,361 @@ def find_canonical_definition(signatures: List[Signature]) -> Optional[Signature return signatures[0] +def _group_types_by_name(all_types: List[Signature]) -> Dict[str, List[Signature]]: + """Group type/interface signatures by base name.""" + by_name: Dict[str, List[Signature]] = {} + for sig in all_types: + if sig.name not in by_name: + by_name[sig.name] = [] + by_name[sig.name].append(sig) + return by_name + + +def _append_duplicate_type_issue( + name: str, + sigs: List[Signature], + get_first_location: Callable[[List[str]], Tuple[Path, int]], + issues: List[ValidationIssue], +) -> None: + """Append one duplicate_type issue for identical type/interface.""" + locations = [sig.location for sig in sigs] + file_path, line_num = get_first_location(locations) + message_parts = [f"Duplicate identical type/interface for '{name}':"] + for sig in sigs: + message_parts.append(f" Location: {sig.location}") + issues.append(ValidationIssue.create( + "duplicate_type", + file_path, + line_num, + line_num, + message="\n".join(message_parts), + severity='warning', + type_name=name, + locations=locations + )) + + +def _append_type_conflict_issue( + name: str, + normalized_sigs: Dict[str, List[Signature]], + get_first_location: Callable[[List[str]], Tuple[Path, int]], + issues: List[ValidationIssue], +) -> None: + """Append one conflicting_type_definitions issue.""" + locations = [ + sig.location + for sig_list in normalized_sigs.values() + for sig in sig_list + ] + file_path, line_num = get_first_location(locations) + message_parts = [f"Conflicting type definitions for '{name}':"] + for norm_sig, sig_list in normalized_sigs.items(): + message_parts.append(f" {norm_sig}:") + for sig in sig_list: + message_parts.append(f" Location: {sig.location}") + issues.append(ValidationIssue.create( + "conflicting_type_definitions", + file_path, + line_num, + line_num, + message="\n".join(message_parts), + severity='error', + type_name=name, + locations=locations + )) + + +def _build_normalized_type_sigs(sigs: List[Signature]) -> Dict[str, List[Signature]]: + """Build dict of normalized signature string -> list of signatures (for types).""" + normalized_sigs: Dict[str, List[Signature]] = {} + for sig in sigs: + sig_str = sig.name + (sig.generic_params or '') + if sig_str not in normalized_sigs: + normalized_sigs[sig_str] = [] + normalized_sigs[sig_str].append(sig) + return normalized_sigs + + +def _append_stub_issues( + stubs: List[Signature], + canonical: Signature, + name: str, + parse_location: Callable[[str], Tuple[Path, int]], + issues: List[ValidationIssue], +) -> None: + """Append type_stub validation issues for stubs vs canonical.""" + for stub in stubs: + if not stub.has_body or ( + canonical.has_body and canonical.method_count > stub.method_count + ): + canonical_count = ( + canonical.method_count + if canonical.kind == 'interface' + else canonical.field_count + ) + stub_count = ( + stub.method_count + if stub.kind == 'interface' + else stub.field_count + ) + count_type = ( + 'method_count' if canonical.kind == 'interface' else 'field_count' + ) + stub_file, stub_line = parse_location(stub.location) + message = ( + f"Type/interface stub detected for '{name}':" + f" Canonical: {canonical.location} " + f"(has_body={canonical.has_body}, " + f"{count_type}={canonical_count})\n" + f" Stub: {stub.location} " + f"(has_body={stub.has_body}, " + f"{count_type}={stub_count})" + ) + issues.append(ValidationIssue.create( + "type_stub", + stub_file, + stub_line, + stub_line, + message=message, + severity='warning', + type_name=name, + canonical_location=canonical.location, + stub_location=stub.location + )) + + +def _append_method_inconsistency_if_stubs_differ( + method_key: str, + method_sigs: List[Signature], + canonical: Signature, + get_first_location: Callable[[List[str]], Tuple[Path, int]], + issues: List[ValidationIssue], +) -> None: + """If all stubs have different normalized signatures, append one issue.""" + normalized_methods: Dict[str, List[Signature]] = {} + for method_sig in method_sigs: + norm_sig = normalize_go_signature(method_sig.normalized_signature()) + if norm_sig not in normalized_methods: + normalized_methods[norm_sig] = [] + normalized_methods[norm_sig].append(method_sig) + if len(normalized_methods) <= 1: + return + locations = [ + msig.location + for msigs in normalized_methods.values() + for msig in msigs + ] + file_path, line_num = get_first_location(locations) + message_parts = [ + f"Method signature inconsistency for " + f"'{method_key}' in interface '{canonical.name}':" + ] + for norm_sig, msigs in normalized_methods.items(): + message_parts.append(f" Signature: {norm_sig}") + for msig in msigs: + message_parts.append(f" Location: {msig.location}") + issues.append(ValidationIssue.create( + "method_signature_inconsistency", + file_path, + line_num, + line_num, + message="\n".join(message_parts), + severity='error', + method_name=method_key, + interface_name=canonical.name, + locations=locations + )) + + +def _check_one_interface_method_consistency( + method_key: str, + method_sigs: List[Signature], + canonical: Signature, + *, + get_first_location: Callable[[List[str]], Tuple[Path, int]], + parse_location: Callable[[str], Tuple[Path, int]], + issues: List[ValidationIssue], +) -> None: + """Check one interface method's consistency (all stubs or canonical vs stubs).""" + canonical_method = None + stub_methods = [] + for method_sig in method_sigs: + if method_sig.has_body: + canonical_method = method_sig + else: + stub_methods.append(method_sig) + if not canonical_method: + _append_method_inconsistency_if_stubs_differ( + method_key, method_sigs, canonical, get_first_location, issues + ) + return + for stub_method in stub_methods: + canonical_norm = normalize_go_signature(canonical_method.normalized_signature()) + stub_norm = normalize_go_signature(stub_method.normalized_signature()) + if canonical_norm != stub_norm: + stub_file, stub_line = parse_location(stub_method.location) + message = ( + f"Interface stub method " + f"'{method_key}' differs from canonical:" + f" Canonical: {canonical_norm}\n" + f" Location: {canonical_method.location}\n" + f" Stub: {stub_norm}\n" + f" Location: {stub_method.location}" + ) + issues.append(ValidationIssue.create( + "interface_stub_method_differs", + stub_file, + stub_line, + stub_line, + message=message, + severity='error', + method_name=method_key, + canonical_location=canonical_method.location, + stub_location=stub_method.location + )) + + +def _collect_canonical_interface_method_names( + canonical: Signature, signatures: Dict[str, List[Signature]] +) -> set: + """Return set of method names defined in canonical interface block (same file, ~100 lines).""" + canonical_file, canonical_line_str = canonical.location.split(':', 1) + try: + canonical_line = int(canonical_line_str) + except ValueError: + canonical_line = 0 + names = set() + for sig_list in signatures.values(): + for sig in sig_list: + if not (sig.kind == 'method' and sig.receiver == canonical.name and not sig.has_body): + continue + sig_file, sig_line_str = sig.location.split(':', 1) + try: + sig_line = int(sig_line_str) + line_ok = canonical_line <= sig_line <= canonical_line + 100 + if sig_file == canonical_file and line_ok: + names.add(sig.name) + except ValueError: + pass + return names + + +def _build_interface_methods_map( + canonical: Signature, signatures: Dict[str, List[Signature]] +) -> Dict[str, List[Signature]]: + """Return map method_name -> list of Signature for methods on canonical.name.""" + out: Dict[str, List[Signature]] = {} + for sig_list in signatures.values(): + for sig in sig_list: + if sig.kind == 'method' and sig.receiver == canonical.name: + out.setdefault(sig.name, []).append(sig) + return out + + +def _check_interface_method_consistency( + canonical: Signature, + signatures: Dict[str, List[Signature]], + get_first_location: Callable[[List[str]], Tuple[Path, int]], + parse_location: Callable[[str], Tuple[Path, int]], + issues: List[ValidationIssue], +) -> None: + """Check interface method consistency (canonical vs stubs, method not in canonical).""" + canonical_method_names = _collect_canonical_interface_method_names(canonical, signatures) + interface_methods = _build_interface_methods_map(canonical, signatures) + if canonical_method_names and canonical.has_body: + for method_key, method_sigs in interface_methods.items(): + if method_key not in canonical_method_names: + locations = [msig.location for msig in method_sigs] + file_path, line_num = get_first_location(locations) + message_parts = [ + f"Method '{method_key}' is defined for interface '{canonical.name}' " + "but is not in the canonical interface definition:", + f" Canonical interface: {canonical.location}", + f" Methods in canonical: {sorted(canonical_method_names)}", + ] + for method_sig in method_sigs: + message_parts.append(f" Method location: {method_sig.location}") + issues.append(ValidationIssue.create( + "method_not_in_canonical_interface", + file_path, line_num, line_num, + message="\n".join(message_parts), + severity='error', + method_name=method_key, + interface_name=canonical.name, + canonical_location=canonical.location, + locations=locations, + )) + for method_key, method_sigs in interface_methods.items(): + if len(method_sigs) <= 1: + continue + _check_one_interface_method_consistency( + method_key, method_sigs, canonical, + get_first_location=get_first_location, + parse_location=parse_location, + issues=issues, + ) + + +def _process_method_group( + key: str, + sig_list: List[Signature], + get_first_location: Callable[[List[str]], Tuple[Path, int]], + issues: List[ValidationIssue], +) -> None: + """Check one method/function group for inconsistency or duplicate.""" + methods = [s for s in sig_list if s.kind in ('method', 'func')] + if not methods: + return + normalized_sigs: Dict[str, List[Signature]] = {} + for sig in methods: + norm_sig = normalize_go_signature(sig.normalized_signature()) + if norm_sig not in normalized_sigs: + normalized_sigs[norm_sig] = [] + normalized_sigs[norm_sig].append(sig) + if len(normalized_sigs) > 1: + locations = [ + sig.location + for sigs in normalized_sigs.values() + for sig in sigs + ] + file_path, line_num = get_first_location(locations) + message_parts = [f"Signature inconsistency for '{key}':"] + for norm_sig, sigs in normalized_sigs.items(): + message_parts.append(f" Signature: {norm_sig}") + for sig in sigs: + message_parts.append(f" Location: {sig.location}") + issues.append(ValidationIssue.create( + "signature_inconsistency", + file_path, + line_num, + line_num, + message="\n".join(message_parts), + severity='error', + signature_key=key, + locations=locations + )) + elif len(methods) > 1: + locations = [sig.location for sig in methods] + file_path, line_num = get_first_location(locations) + message_parts = [f"Duplicate identical signature for '{key}':"] + for sig in methods: + message_parts.append(f" Location: {sig.location}") + issues.append(ValidationIssue.create( + "duplicate_signature", + file_path, + line_num, + line_num, + message="\n".join(message_parts), + severity='error', + signature_key=key, + locations=locations + )) + + def check_signature_consistency( signatures: Dict[str, List[Signature]], - verbose: bool = False, - repo_root: Optional[Path] = None + _verbose: bool = False, + _repo_root: Optional[Path] = None ) -> List[ValidationIssue]: """ Check for signature inconsistencies. @@ -554,56 +741,17 @@ def get_first_location(location_strs: List[str]) -> Tuple[Path, int]: if sig.kind in ('type', 'interface'): all_types.append(sig) - # Check types/interfaces for conflicts (grouped by base name) if all_types: - # Group by base name - by_name = {} - for sig in all_types: - if sig.name not in by_name: - by_name[sig.name] = [] - by_name[sig.name].append(sig) - - # Check each name group for conflicts + by_name = _group_types_by_name(all_types) for name, sigs in by_name.items(): if len(sigs) <= 1: continue - - # Create normalized signatures including generics - normalized_sigs = {} - for sig in sigs: - # Include generics in signature - sig_str = sig.name - if sig.generic_params: - sig_str += sig.generic_params - - if sig_str not in normalized_sigs: - normalized_sigs[sig_str] = [] - normalized_sigs[sig_str].append(sig) - - # If different normalized signatures, it's a conflict + normalized_sigs = _build_normalized_type_sigs(sigs) if len(normalized_sigs) > 1: - locations = [ - sig.location - for sig_list in normalized_sigs.values() - for sig in sig_list - ] - file_path, line_num = get_first_location(locations) - message_parts = [f"Conflicting type definitions for '{name}':"] - for norm_sig, sig_list in normalized_sigs.items(): - message_parts.append(f" {norm_sig}:") - for sig in sig_list: - message_parts.append(f" Location: {sig.location}") - issues.append(ValidationIssue( - "conflicting_type_definitions", - file_path, - line_num, - line_num, - "\n".join(message_parts), - severity='error', - type_name=name, - locations=locations - )) - continue # Skip stub detection for this group + _append_type_conflict_issue( + name, normalized_sigs, get_first_location, issues + ) + continue # If same normalized signature, check for duplicates or stubs canonical = find_canonical_definition(sigs) @@ -614,282 +762,149 @@ def get_first_location(location_strs: List[str]) -> Tuple[Path, int]: stubs = [s for s in sigs if s != canonical] - # If there are no stubs (all are identical and canonical), it's still a duplicate if not stubs and len(sigs) > 1: - locations = [sig.location for sig in sigs] - file_path, line_num = get_first_location(locations) - message_parts = [f"Duplicate identical type/interface for '{name}':"] - for sig in sigs: - message_parts.append(f" Location: {sig.location}") - issues.append(ValidationIssue( - "duplicate_type", - file_path, - line_num, - line_num, - "\n".join(message_parts), - severity='warning', - type_name=name, - locations=locations - )) + _append_duplicate_type_issue(name, sigs, get_first_location, issues) + _append_stub_issues(stubs, canonical, name, parse_location, issues) - # Warn about stubs - for stub in stubs: - if not stub.has_body or ( - canonical.has_body and - canonical.method_count > stub.method_count - ): - canonical_count = ( - canonical.method_count - if canonical.kind == 'interface' - else canonical.field_count - ) - stub_count = ( - stub.method_count - if stub.kind == 'interface' - else stub.field_count - ) - count_type = ( - 'method_count' - if canonical.kind == 'interface' - else 'field_count' - ) - canonical_file, canonical_line = parse_location(canonical.location) - stub_file, stub_line = parse_location(stub.location) - message = ( - f"Type/interface stub detected for '{name}':" - f" Canonical: {canonical.location} " - f"(has_body={canonical.has_body}, " - f"{count_type}={canonical_count})\n" - f" Stub: {stub.location} " - f"(has_body={stub.has_body}, " - f"{count_type}={stub_count})" - ) - issues.append(ValidationIssue( - "type_stub", - stub_file, - stub_line, - stub_line, - message, - severity='warning', - type_name=name, - canonical_location=canonical.location, - stub_location=stub.location - )) - - # For interfaces, check method signatures in stubs vs canonical if canonical.kind == 'interface': - # First, collect methods that are in the canonical interface definition - # (methods inside the interface body have has_body=False and are in the same file - # and within a reasonable line range of the canonical definition) - canonical_method_names = set() - canonical_file, canonical_line_str = canonical.location.split(':', 1) - try: - canonical_line = int(canonical_line_str) - except ValueError: - canonical_line = 0 - - # Methods in the canonical interface should be in the same file, - # have has_body=False (inside interface body), and be within ~100 lines - # of the interface definition (reasonable range for interface body) - for sig_list in signatures.values(): - for sig in sig_list: - if (sig.kind == 'method' and - sig.receiver == canonical.name and - not sig.has_body): - sig_file, sig_line_str = sig.location.split(':', 1) - try: - sig_line = int(sig_line_str) - # Check if method is in same file and within reasonable range - if (sig_file == canonical_file and - canonical_line <= sig_line <= canonical_line + 100): - canonical_method_names.add(sig.name) - except ValueError: - pass - - # Get all methods for this interface from all signatures - interface_methods = {} - for sig_list in signatures.values(): - for sig in sig_list: - if sig.kind == 'method' and sig.receiver == canonical.name: - method_key = sig.name - if method_key not in interface_methods: - interface_methods[method_key] = [] - interface_methods[method_key].append(sig) - - # Check if any methods are defined that are NOT in the canonical definition - if canonical_method_names and canonical.has_body: - for method_key, method_sigs in interface_methods.items(): - if method_key not in canonical_method_names: - # Method is defined but not in canonical interface definition - locations = [msig.location for msig in method_sigs] - file_path, line_num = get_first_location(locations) - message_parts = [ - f"Method '{method_key}' is defined for interface " - f"'{canonical.name}' but is not in the canonical " - f"interface definition:", - f" Canonical interface: {canonical.location}", - f" Methods in canonical: {sorted(canonical_method_names)}" - ] - for method_sig in method_sigs: - message_parts.append(f" Method location: {method_sig.location}") - issues.append(ValidationIssue( - "method_not_in_canonical_interface", - file_path, - line_num, - line_num, - "\n".join(message_parts), - severity='error', - method_name=method_key, - interface_name=canonical.name, - canonical_location=canonical.location, - locations=locations - )) - - # Check each method for consistency - for method_key, method_sigs in interface_methods.items(): - if len(method_sigs) <= 1: - continue - - # Find canonical method (prefer has_body=True) - canonical_method = None - stub_methods = [] - for method_sig in method_sigs: - if method_sig.has_body: - canonical_method = method_sig - else: - stub_methods.append(method_sig) - - if not canonical_method: - # All are stubs, compare them - normalized_methods = {} - for method_sig in method_sigs: - norm_sig = normalize_go_signature(method_sig.normalized_signature()) - if norm_sig not in normalized_methods: - normalized_methods[norm_sig] = [] - normalized_methods[norm_sig].append(method_sig) - - if len(normalized_methods) > 1: - locations = [ - msig.location - for msigs in normalized_methods.values() - for msig in msigs - ] - file_path, line_num = get_first_location(locations) - message_parts = [ - f"Method signature inconsistency for " - f"'{method_key}' in interface '{canonical.name}':" - ] - for norm_sig, msigs in normalized_methods.items(): - message_parts.append(f" Signature: {norm_sig}") - for msig in msigs: - message_parts.append(f" Location: {msig.location}") - issues.append(ValidationIssue( - "method_signature_inconsistency", - file_path, - line_num, - line_num, - "\n".join(message_parts), - severity='error', - method_name=method_key, - interface_name=canonical.name, - locations=locations - )) - else: - # Compare stubs against canonical - for stub_method in stub_methods: - canonical_norm = normalize_go_signature( - canonical_method.normalized_signature() - ) - stub_norm = normalize_go_signature( - stub_method.normalized_signature() - ) - - if canonical_norm != stub_norm: - stub_file, stub_line = parse_location(stub_method.location) - message = ( - f"Interface stub method " - f"'{method_key}' differs from canonical:" - f" Canonical: {canonical_norm}\n" - f" Location: {canonical_method.location}\n" - f" Stub: {stub_norm}\n" - f" Location: {stub_method.location}" - ) - issues.append(ValidationIssue( - "interface_stub_method_differs", - stub_file, - stub_line, - stub_line, - message, - severity='error', - method_name=method_key, - canonical_location=canonical_method.location, - stub_location=stub_method.location - )) - - # Check methods/functions (grouped by normalized_key) + _check_interface_method_consistency( + canonical, signatures, get_first_location, parse_location, issues + ) + for key, sig_list in signatures.items(): if len(sig_list) <= 1: continue - - # Group by kind - skip types/interfaces as they're handled above - methods = [s for s in sig_list if s.kind in ('method', 'func')] - - # Check methods/functions - if methods: - # Normalize signatures for comparison - normalized_sigs = {} - for sig in methods: - norm_sig = normalize_go_signature(sig.normalized_signature()) - if norm_sig not in normalized_sigs: - normalized_sigs[norm_sig] = [] - normalized_sigs[norm_sig].append(sig) - - # Check for different signatures - if len(normalized_sigs) > 1: - # Different signatures found - ERROR - locations = [ - sig.location - for sigs in normalized_sigs.values() - for sig in sigs - ] - file_path, line_num = get_first_location(locations) - message_parts = [f"Signature inconsistency for '{key}':"] - for norm_sig, sigs in normalized_sigs.items(): - message_parts.append(f" Signature: {norm_sig}") - for sig in sigs: - message_parts.append(f" Location: {sig.location}") - issues.append(ValidationIssue( - "signature_inconsistency", - file_path, - line_num, - line_num, - "\n".join(message_parts), - severity='error', - signature_key=key, - locations=locations - )) - elif len(methods) > 1: - # Same signature, multiple locations - ERROR - # This will be handled in main() with canonical detection - locations = [sig.location for sig in methods] - file_path, line_num = get_first_location(locations) - message_parts = [f"Duplicate identical signature for '{key}':"] - for sig in methods: - message_parts.append(f" Location: {sig.location}") - issues.append(ValidationIssue( - "duplicate_signature", - file_path, - line_num, - line_num, - "\n".join(message_parts), - severity='error', - signature_key=key, - locations=locations - )) + _process_method_group(key, sig_list, get_first_location, issues) return issues +def _reported_keys_from_issues(issues: List[ValidationIssue]) -> Set[str]: + """Extract signature/type/method keys already reported in issues.""" + reported = set() + for issue in issues: + if not isinstance(issue, ValidationIssue): + continue + key = ( + issue.extra_fields.get('signature_key') or + issue.extra_fields.get('type_name') or + issue.extra_fields.get('method_name') + ) + if key: + reported.add(key) + else: + match = re.search(r"for '([^']+)'|'([^']+)'", issue.message) + if match: + reported.add(match.group(1) or match.group(2)) + return reported + + +def _append_unreported_duplicate_issue( + key: str, + sig_list: List[Signature], + repo_root: Path, + issues: List[ValidationIssue], +) -> None: + """Append one duplicate_signature issue for key/sig_list if unreported.""" + canonical_sig, _ctx, confidence, reason = find_canonical_signature( + sig_list, repo_root + ) + locations = [sig.location for sig in sig_list] + if locations: + location_str = locations[0] + if ':' in location_str: + file_str, line_str = location_str.split(':', 1) + try: + line_num = int(line_str) + except ValueError: + line_num = 1 + file_path = Path(file_str) + else: + file_path = Path(location_str) + line_num = 1 + else: + file_path = Path("unknown") + line_num = 1 + if canonical_sig and confidence >= 0.7: + message_parts = [ + f"Duplicate identical signature for '{key}':", + f" Suggested canonical (confidence: {confidence:.0%}): " + f"{canonical_sig.location} (reason: {reason})", + " Other locations:" + ] + other_locations = [ + f" {sig.location}" for sig in sig_list if sig != canonical_sig + ] + message_parts.extend(other_locations) + suggestion = f"Use canonical: {canonical_sig.location}" + else: + message_parts = [f"Duplicate identical signature for '{key}':"] + all_locations = [f" Location: {sig.location}" for sig in sig_list] + message_parts.extend(all_locations) + if canonical_sig: + message_parts.append( + f" Note: {canonical_sig.location} may be canonical " + f"(low confidence: {confidence:.0%}, reason: {reason})" + ) + suggestion = None + issues.append(ValidationIssue.create( + "duplicate_signature", + file_path, + line_num, + line_num, + message="\n".join(message_parts), + severity='error', + suggestion=suggestion, + signature_key=key, + locations=locations, + canonical_location=canonical_sig.location if canonical_sig else None + )) + + +def _collect_signatures( + md_files: List[Path], + repo_root: Path, + output: OutputBuilder, + verbose: bool, +) -> List[Signature]: + all_signatures: List[Signature] = [] + for md_file in md_files: + if verbose: + output.add_verbose_line(f'Extracting signatures from {md_file.name}...') + file_sigs = extract_signatures_from_markdown_file(md_file, repo_root) + all_signatures.extend(file_sigs) + return all_signatures + + +def _group_signatures( + all_signatures: List[Signature], +) -> Dict[str, List[Signature]]: + signatures_by_key: Dict[str, List[Signature]] = defaultdict(list) + for sig in all_signatures: + if is_example_signature_name(sig.name): + continue + key = sig.normalized_key() + signatures_by_key[key].append(sig) + return signatures_by_key + + +def _collect_unreported_duplicates( + all_signatures: List[Signature], + signatures_by_key: Dict[str, List[Signature]], + repo_root: Path, + issues: List[ValidationIssue], +) -> None: + duplicate_count = len(all_signatures) - len(signatures_by_key) + if duplicate_count <= 0: + return + reported_keys = _reported_keys_from_issues(issues) + for key, sig_list in signatures_by_key.items(): + sig_list = [sig for sig in sig_list if not is_example_signature_name(sig.name)] + if len(sig_list) < 2 or key in reported_keys: + continue + _append_unreported_duplicate_issue(key, sig_list, repo_root, issues) + + def main(): """Main entry point.""" # Show help if requested @@ -897,30 +912,15 @@ def main(): print(__doc__) return 0 - # Parse command line arguments - verbose = '--verbose' in sys.argv or '-v' in sys.argv - no_color = parse_no_color_flag(sys.argv) - no_fail = '--no-fail' in sys.argv - output_file = None - target_paths_str = None - - for i, arg in enumerate(sys.argv): - if arg in ('--output', '-o') and i + 1 < len(sys.argv): - output_file = sys.argv[i + 1] - elif arg in ('--path', '-p') and i + 1 < len(sys.argv): - target_paths_str = sys.argv[i + 1] + verbose, no_color, no_fail, output_file, target_paths_str = parse_cli_args( + sys.argv + ) # Parse comma-separated paths target_paths = parse_paths(target_paths_str) # Create output builder (header streams immediately if verbose) - output = OutputBuilder( - "Go Signature Consistency", - "Validates signature consistency within tech specs", - no_color=no_color, - verbose=verbose, - output_file=output_file - ) + output = build_output(verbose, no_color, output_file) # Find repository root repo_root = get_workspace_root() @@ -943,25 +943,14 @@ def main(): output.add_blank_line("working_verbose") # Collect all signatures - all_signatures = [] - for md_file in md_files: - if verbose: - output.add_verbose_line(f'Extracting signatures from {md_file.name}...') - file_sigs = extract_signatures_from_markdown_file(md_file, repo_root) - all_signatures.extend(file_sigs) + all_signatures = _collect_signatures(md_files, repo_root, output, verbose) if verbose: output.add_verbose_line(f'Found {len(all_signatures)} total signatures') output.add_blank_line("working_verbose") # Group signatures by normalized key, filtering out examples - signatures_by_key = defaultdict(list) - for sig in all_signatures: - # Skip example signatures - if is_example_signature_name(sig.name): - continue - key = sig.normalized_key() - signatures_by_key[key].append(sig) + signatures_by_key = _group_signatures(all_signatures) # Check for inconsistencies if verbose: @@ -969,141 +958,25 @@ def main(): output.add_blank_line("working_verbose") issues = check_signature_consistency( - signatures_by_key, verbose, repo_root=repo_root + signatures_by_key, verbose, _repo_root=repo_root ) # Check if signature counts don't match (indicates duplicates that weren't caught) - # Find all duplicate keys and generate specific warnings for any that weren't reported - duplicate_count = len(all_signatures) - len(signatures_by_key) - if duplicate_count > 0: - # Track which keys were already reported in issues - reported_keys = set() - import re - for issue in issues: - # issue is a ValidationIssue - if isinstance(issue, ValidationIssue): - # Try to extract key from extra_fields - key = ( - issue.extra_fields.get('signature_key') or - issue.extra_fields.get('type_name') or - issue.extra_fields.get('method_name') - ) - if key: - reported_keys.add(key) - else: - # Fallback: try to extract from message - match = re.search(r"for '([^']+)'|'([^']+)'", issue.message) - if match: - reported_keys.add(match.group(1) or match.group(2)) - - # Find which keys have duplicates that weren't already reported - for key, sig_list in signatures_by_key.items(): - # Filter out any example signatures that might have slipped through - sig_list = [sig for sig in sig_list if not is_example_signature_name(sig.name)] - if len(sig_list) < 2: - continue # Not enough non-example signatures to be a duplicate - if key not in reported_keys: - # Find canonical signature - canonical_sig, canonical_ctx, confidence, reason = find_canonical_signature( - sig_list, repo_root - ) - - # Build error message with canonical suggestion - locations = [sig.location for sig in sig_list] - # Get first location from list - if locations: - location_str = locations[0] - if ':' in location_str: - file_str, line_str = location_str.split(':', 1) - try: - line_num = int(line_str) - except ValueError: - line_num = 1 - file_path = Path(file_str) - else: - file_path = Path(location_str) - line_num = 1 - else: - file_path = Path("unknown") - line_num = 1 - # Build message parts in a single pass through sig_list - if canonical_sig and confidence >= 0.7: - # High confidence - suggest canonical - message_parts = [ - f"Duplicate identical signature for '{key}':", - f" Suggested canonical (confidence: {confidence:.0%}): " - f"{canonical_sig.location} (reason: {reason})", - " Other locations:" - ] - # Single loop: collect non-canonical locations - other_locations = [ - f" {sig.location}" for sig in sig_list if sig != canonical_sig - ] - message_parts.extend(other_locations) - suggestion = f"Use canonical: {canonical_sig.location}" - else: - # Low confidence - show all locations without suggestion - message_parts = [f"Duplicate identical signature for '{key}':"] - # Single loop: collect all locations - all_locations = [f" Location: {sig.location}" for sig in sig_list] - message_parts.extend(all_locations) - if canonical_sig: - message_parts.append( - f" Note: {canonical_sig.location} may be canonical " - f"(low confidence: {confidence:.0%}, reason: {reason})" - ) - suggestion = None - issues.append(ValidationIssue( - "duplicate_signature", - file_path, - line_num, - line_num, - "\n".join(message_parts), - severity='error', - suggestion=suggestion, - signature_key=key, - locations=locations, - canonical_location=canonical_sig.location if canonical_sig else None - )) + _collect_unreported_duplicates( + all_signatures, signatures_by_key, repo_root, issues + ) # Report results (headers will be added automatically by add_error_line/add_warning_line) # Filter issues by severity in a single loop - errors = [] - warnings = [] - for issue in issues: - if issue.matches(severity='error'): - errors.append(issue) - if issue.matches(severity='warning'): - warnings.append(issue) + errors, warnings = split_issues(issues) - for error in errors: - output.add_error_line(error.format_message(no_color=no_color)) - - for warning in warnings: - output.add_warning_line(warning.format_message(no_color=no_color)) + emit_issues(output, errors, warnings, no_color) # Always add summary section - summary_items = [ - ("Signatures checked:", len(all_signatures)), - ("Unique definitions:", len(signatures_by_key)), - ] - if errors: - summary_items.append(("Errors found:", len(errors))) - if warnings: - summary_items.append(("Warnings found:", len(warnings))) - output.add_summary_header() - output.add_summary_section(summary_items) + emit_summary(output, all_signatures, signatures_by_key, errors, warnings) # Final message: success if no errors, but mention warnings if present - if not errors: - if warnings: - output.add_success_message( - 'All signatures are consistent! (Some warnings were found - see above)' - ) - else: - output.add_success_message('All signatures are consistent!') - else: - output.add_failure_message("Validation failed. Please fix the errors above.") + emit_final_message(output, errors, warnings) output.print() # Only treat actual errors as failures, not warnings diff --git a/scripts/validate_heading_numbering.py b/scripts/validate_heading_numbering.py index bee4c7dc..a6c554a1 100644 --- a/scripts/validate_heading_numbering.py +++ b/scripts/validate_heading_numbering.py @@ -11,7 +11,7 @@ - Child heading numbers match their parent section number - Headings are not overly-deeply nested (flags H6 and beyond) - Headings follow Title Case capitalization (warnings with suggestions) -- Organizational headings with no content are flagged as errors +- Organizational headings with no content are flagged as warnings - H1 headings warn on numbering and error on duplicates Usage: @@ -49,80 +49,43 @@ docs/requirements,docs/tech_specs """ -import re +import argparse import sys from pathlib import Path from collections import defaultdict from typing import List -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" - -# Import shared utilities -for module_path in (str(scripts_dir), str(lib_dir)): - if module_path not in sys.path: - sys.path.insert(0, module_path) - -# Import shared utilities -from lib._validation_utils import ( # noqa: E402 - OutputBuilder, parse_no_color_flag, format_issue_message, +from lib._validation_utils import ( + OutputBuilder, parse_no_color_flag, get_workspace_root, parse_paths, - build_heading_hierarchy, is_organizational_heading, - remove_backticks_keep_content, has_backticks, get_backticks_error_message, ValidationIssue, find_markdown_files ) - - -# Constants for validation thresholds -MAX_HEADING_NUMBER_SEGMENT = 20 # Maximum value for any number segment in heading numbering -MAX_ORGANIZATIONAL_PROSE_LINES = 5 # Maximum lines of prose for organizational heading check - -# Compiled regex patterns for performance (module level) -_RE_HEADING_PATTERN = re.compile(r'^(#{1,})\s+(.+)$') -_RE_NUMBERED_HEADING_PATTERN = re.compile(r'^([0-9]+(?:\.[0-9]+)*)\.?\s+(.+)$') -_RE_SPLIT_WORDS = re.compile(r'\S+|\s+') -_RE_WHITESPACE_ONLY = re.compile(r'^\s+$') -_RE_FILENAME_PATTERN = re.compile(r'^[\w\-]+\.(\w+)$') -_RE_NON_WORD_CHARS = re.compile(r'[^\w]') -_RE_NUMBERING_PREFIX = re.compile(r'^([0-9]+(?:\.[0-9]+)*)\.?\s+(.+)$') -_RE_FIRST_LETTER = re.compile(r'[a-zA-Z]') - - -class HeadingInfo: - """Represents a heading with its metadata for sorting.""" - def __init__(self, file, line_num, level, numbers, heading_text, full_line, - parent=None, issue=None): - self.file = file - self.line_num = line_num - self.level = level - self.numbers = numbers # List of integers - self.heading_text = heading_text - self.full_line = full_line - self.parent = parent # Reference to parent HeadingInfo (if any) - self.issue = issue # Reference to related HeadingIssue (if any) - self.original_number = '.'.join(map(str, numbers)) # Original number as string - self.corrected_number = None # Will be set during correction calculation - self.has_period = False # For H2: whether period follows number in original - self.corrected_capitalization = None # Will be set during capitalization check - - def sort_key(self): - """Return a sort key for proper numeric ordering.""" - # Create a tuple: (numbers as tuple, level) - # Sort by numeric values first, then by level for same numbers - # This ensures proper numeric sorting (e.g., [1, 10] comes after [1, 2]) - return (tuple(self.numbers), self.level) +from lib._validate_heading_numbering_title_case import to_title_case as _to_title_case +from lib._validate_heading_numbering_report import print_summary as _print_summary_report +from lib._validate_heading_numbering_models import ( + HeadingInfo, + RE_HEADING_PATTERN, + RE_NUMBERED_HEADING_PATTERN, +) +from lib._validate_heading_numbering_helpers import ( + validate_heading_structure as _validate_heading_structure, + is_go_code_related_heading as _is_go_code_related_heading, +) +from lib.heading_numbering import ( + check_duplicate_headings as _check_duplicate_headings, + check_excessive_numbering as _check_excessive_numbering, + check_h2_period_consistency as _check_h2_period_consistency, + check_heading_capitalization as _check_heading_capitalization, + check_organizational_headings as _check_organizational_headings, + check_single_word_headings as _check_single_word_headings, +) class HeadingValidator: """Validates heading numbering in markdown files.""" - # Pattern to match markdown headings: ##+ followed by optional space and number - HEADING_PATTERN = _RE_HEADING_PATTERN - - # Pattern to match numbered headings: starts with number(s) followed by - # optional period and space. Uses [0-9]+\. pattern to match each number - # segment explicitly. Handles both "1 Title" and "1. Title" formats - NUMBERED_HEADING_PATTERN = _RE_NUMBERED_HEADING_PATTERN + HEADING_PATTERN = RE_HEADING_PATTERN + NUMBERED_HEADING_PATTERN = RE_NUMBERED_HEADING_PATTERN def __init__(self, verbose=False, repo_root=None, no_color=False): self.verbose = verbose @@ -181,37 +144,10 @@ def get_heading_level(self, heading_prefix): """Get heading level from markdown prefix (##, ###, etc.).""" return len(heading_prefix) - def check_and_record_backticks(self, filepath, line_num, heading_text, full_line, heading_info): - """ - Check for backticks in heading text and record error if found. - - Args: - filepath: Path to the file - line_num: Line number of the heading - heading_text: The heading text to check (without number prefix) - full_line: The full heading line - heading_info: HeadingInfo object to associate with the error - """ - if has_backticks(heading_text): - error = ValidationIssue( - "heading_formatting", - Path(filepath), - line_num, - line_num, - get_backticks_error_message(), - severity='error', - heading=full_line, - heading_info=heading_info - ) - self.issues.append(error) - heading_info.issue = error - if self.first_error_line[filepath] is None: - self.first_error_line[filepath] = error.start_line - def build_corrected_full_line(self, heading_info): """ Build the corrected full heading line with numbering and capitalization fixes. - Removes backticks from the heading text in suggestions. + Preserves backticks in heading text (case inside backticks is not changed). Returns the full markdown heading line with corrections applied. """ @@ -227,14 +163,11 @@ def build_corrected_full_line(self, heading_info): if heading_info.corrected_number: corrected_number = heading_info.corrected_number - # Determine the corrected heading text (capitalization) + # Determine the corrected heading text (capitalization; backticks preserved) corrected_text = heading_info.heading_text if heading_info.corrected_capitalization: corrected_text = heading_info.corrected_capitalization - # Remove backticks from the heading text for suggestions (keep content) - corrected_text = remove_backticks_keep_content(corrected_text) - # Trim extra whitespace from the text corrected_text = corrected_text.strip() @@ -257,293 +190,7 @@ def build_corrected_full_line(self, heading_info): # Build the corrected line if heading_info.level == 2 and has_period: return f"{prefix} {corrected_number}. {corrected_text}" - else: - return f"{prefix} {corrected_number} {corrected_text}" - - def _find_backtick_ranges(self, text): - """Find all backtick-enclosed sections and their positions.""" - backtick_ranges = [] - i = 0 - while i < len(text): - if text[i] == '`': - start = i - i += 1 - # Find the closing backtick - while i < len(text) and text[i] != '`': - i += 1 - if i < len(text): # Found closing backtick - end = i + 1 - backtick_ranges.append((start, end)) - i = end - else: - # Unclosed backtick, treat as regular text - break - else: - i += 1 - return backtick_ranges - - def _is_in_backticks(self, pos, backtick_ranges): - """Check if a character position is inside backticks.""" - for start, end in backtick_ranges: - if start <= pos < end: - return True - return False - - def _should_preserve_part(self, part, part_start, part_end, backtick_ranges): - """Check if a part should be preserved as-is (backticks, underscores, filenames).""" - # Check if part is inside backticks - part_in_backticks = any( - self._is_in_backticks(pos, backtick_ranges) - for pos in range(part_start, part_end) - ) - if part_in_backticks: - return True - - # Check if word contains underscores (e.g., file_name.go, some_function) - if '_' in part: - return True - - # Check if word looks like a filename - common_extensions = [ - 'go', 'md', 'txt', 'json', 'yaml', 'yml', 'xml', 'html', 'css', 'js', - 'ts', 'py', 'sh', 'bat', 'ps1', 'java', 'c', 'cpp', 'h', 'hpp', - 'rs', 'rb', 'php', 'sql', 'csv', 'tsv', 'log', 'conf', 'config', - 'ini', 'toml', 'lock', 'sum', 'mod', 'gitignore', 'editorconfig' - ] - filename_match = _RE_FILENAME_PATTERN.match(part) - if filename_match: - extension = filename_match.group(1).lower() - if extension in common_extensions: - return True - - return False - - def _is_in_code_parentheses(self, part, parts, part_index, text): - """Check if a word is inside parentheses with code-like content (backticks).""" - text_up_to_here = ''.join(parts[:part_index + 1]) - if '(' not in text_up_to_here: - return False - - # Find the most recent unclosed parenthesis - last_open_paren = text_up_to_here.rfind('(') - if last_open_paren < 0: - return False - - # Count parentheses to see if we're inside an unclosed one - text_before = ''.join(parts[:part_index + 1]) - open_count = text_before.count('(') - text_before.count(')') - if open_count <= 0: - return False - - # We're inside parentheses - check if they contain backticks - text_from_paren = text[last_open_paren:] - close_paren_pos = text_from_paren.find(')', 1) - if close_paren_pos > 0: - parens_content = text_from_paren[1:close_paren_pos] - else: - parens_content = text_from_paren[1:] - - return '`' in parens_content - - def _should_capitalize_word( - self, word_clean, is_first, is_last, is_in_code_parens, previous_word_clean=None - ): - """ - Determine if a word should be capitalized based on title case rules. - - Args: - word_clean: The word to check (lowercase, cleaned) - is_first: True if this is the first word - is_last: True if this is the last word - is_in_code_parens: True if word is in code-like parentheses context - previous_word_clean: The previous word (lowercase, cleaned) for phrasal verb detection - """ - lowercase_words = { - 'a', 'an', 'the', # Articles - 'and', 'but', 'or', 'nor', 'for', 'so', 'yet', # Coordinating conjunctions - 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by', 'from', 'up', 'about', - 'into', 'onto', 'upon', 'over', 'under', 'above', 'below', 'across', - 'via' # Short prepositions - } - capitalize_prepositions = { - 'through', 'between', 'among', 'during', 'before', 'after', 'within', - 'without', 'against', 'along', 'around', 'behind', 'beside', 'beyond', - 'inside', 'outside', 'throughout', 'toward', 'towards', 'underneath' - } - programming_keywords = { - 'return', 'if', 'else', 'for', 'while', 'do', 'switch', 'case', - 'break', 'continue', 'goto', 'throw', 'try', 'catch', 'finally', - 'new', 'delete', 'this', 'super', 'static', 'const', 'let', 'var', - 'function', 'class', 'interface', 'enum', 'type', 'import', 'export', - 'async', 'await', 'yield', 'def', 'lambda', 'pass', 'raise', 'except' - } - - # Phrasal verb particles that should be capitalized when part of a phrasal verb - phrasal_particles = {'up', 'down', 'out', 'off', 'in', 'on', 'over', 'away', 'back'} - - # Common phrasal verb bases (verbs that commonly form phrasal verbs) - phrasal_verb_bases = { - 'clean', 'set', 'look', 'pick', 'give', 'make', 'break', 'build', 'call', - 'check', 'close', 'come', 'cut', 'fill', 'get', 'go', 'grow', 'hang', - 'hold', 'keep', 'line', 'live', 'move', 'open', 'pull', 'put', 'show', - 'sign', 'stand', 'start', 'take', 'turn', 'wake', 'warm', 'wrap', 'bring', - 'carry', 'catch', 'come', 'do', 'draw', 'drop', 'end', 'fall', 'find', - 'fix', 'follow', 'hand', 'head', 'help', 'join', 'jump', 'knock', 'lay', - 'leave', 'let', 'lie', 'lock', 'log', 'mix', 'pass', 'pay', 'point', - 'pop', 'push', 'read', 'run', 'send', 'shut', 'sit', 'slow', 'sort', - 'speed', 'split', 'spread', 'step', 'stick', 'stop', 'switch', 'talk', - 'tear', 'think', 'throw', 'tie', 'try', 'turn', 'use', 'walk', 'wash', - 'watch', 'wear', 'wind', 'work', 'write' - } - - # If word is in code-like parentheses context, keep programming keywords lowercase - if is_in_code_parens and word_clean in programming_keywords: - return False - if is_first or is_last: - return True - if word_clean in capitalize_prepositions: - return True - - # Check if this is a phrasal verb particle (like "up" in "Clean Up") - if word_clean in phrasal_particles and previous_word_clean: - if previous_word_clean in phrasal_verb_bases: - # This is part of a phrasal verb, so capitalize it - return True - - if word_clean not in lowercase_words: - return True - return False - - def _capitalize_word(self, part, should_capitalize): - """Apply capitalization to a word part.""" - if not part: - return part - - # Find first letter using regex - match = _RE_FIRST_LETTER.search(part) - if not match: - return part - - first_letter_idx = match.start() - - if should_capitalize: - # Capitalize first letter, preserve rest - return ( - part[:first_letter_idx] + - part[first_letter_idx].upper() + - part[first_letter_idx + 1:] - ) - else: - # Lowercase the word, but preserve existing capitalization - # that suggests proper nouns - has_internal_capitals = any( - c.isupper() for c in part[first_letter_idx + 1:] - if c.isalpha() - ) - - if has_internal_capitals: - # Preserve existing capitalization (likely proper noun/acronym) - # Only lowercase the first letter if it's uppercase - if part[first_letter_idx].isupper(): - return ( - part[:first_letter_idx] + - part[first_letter_idx].lower() + - part[first_letter_idx + 1:] - ) - else: - return part - else: - # No internal capitals, lowercase normally - return ( - part[:first_letter_idx] + - part[first_letter_idx].lower() + - part[first_letter_idx + 1:].lower() - ) - - def to_title_case(self, text): - """ - Convert text to Title Case following standard rules. - - Title Case rules: - - Capitalize first and last word - - Capitalize all major words (nouns, verbs, adjectives, adverbs) - - Lowercase articles (a, an, the) - - Lowercase coordinating conjunctions (and, but, or, nor, for, so, yet) - - Lowercase short prepositions (in, on, at, to, for, of, with, by, etc.) - unless they're the first or last word - - Capitalize prepositions of 4+ letters (through, between, etc.) - - Preserve existing capitalization of proper nouns and acronyms - - Preserve words inside backticks (code) exactly as-is - """ - if not text: - return text - - # Find all backtick-enclosed sections - backtick_ranges = self._find_backtick_ranges(text) - - # Split text into words and separators, preserving structure - parts = _RE_SPLIT_WORDS.findall(text) - result_parts = [] - word_indices = [] # Track which parts are words (not whitespace) - - # First pass: identify words and track their positions in original text - char_pos = 0 - part_positions = [] # Track character positions for each part - for i, part in enumerate(parts): - if not _RE_WHITESPACE_ONLY.match(part): # Not just whitespace - word_indices.append(i) - part_positions.append((char_pos, char_pos + len(part))) - char_pos += len(part) - - if not word_indices: - return text - - # Process each part - for i, part in enumerate(parts): - if _RE_WHITESPACE_ONLY.match(part): - # Preserve whitespace as-is - result_parts.append(part) - continue - - part_start, part_end = part_positions[i] - - # Check if part should be preserved as-is - if self._should_preserve_part(part, part_start, part_end, backtick_ranges): - result_parts.append(part) - continue - - # This is a word - word_clean = _RE_NON_WORD_CHARS.sub('', part.lower()) - if not word_clean: - result_parts.append(part) - continue - - # Check if it's the first or last word - is_first = i == word_indices[0] - is_last = i == word_indices[-1] - - # Get previous word for phrasal verb detection - previous_word_clean = None - if not is_first: - # Find the previous word index - current_word_index = word_indices.index(i) - if current_word_index > 0: - prev_word_i = word_indices[current_word_index - 1] - prev_part = parts[prev_word_i] - previous_word_clean = _RE_NON_WORD_CHARS.sub('', prev_part.lower()) - - # Check if this word is inside parentheses with code-like content - is_in_code_parens = self._is_in_code_parentheses(part, parts, i, text) - - # Determine if word should be capitalized - should_capitalize = self._should_capitalize_word( - word_clean, is_first, is_last, is_in_code_parens, previous_word_clean - ) - - # Apply capitalization - result_parts.append(self._capitalize_word(part, should_capitalize)) - - return ''.join(result_parts) + return f"{prefix} {corrected_number} {corrected_text}" def check_capitalization(self, heading_text): """ @@ -555,8 +202,8 @@ def check_capitalization(self, heading_text): if not heading_text: return True, heading_text - corrected = self.to_title_case(heading_text) - is_correct = (heading_text == corrected) + corrected = _to_title_case(heading_text) + is_correct = heading_text == corrected return is_correct, corrected @@ -572,12 +219,12 @@ def _read_file_lines(self, filepath): return f.readlines() except (IOError, OSError) as e: # File read errors - create ValidationIssue - error = ValidationIssue( + error = ValidationIssue.create( "file_read_error", Path(filepath), 0, 0, - f"Could not read file: {e}", + message=f"Could not read file: {e}", severity='error' ) self.issues.append(error) @@ -585,25 +232,25 @@ def _read_file_lines(self, filepath): return None except UnicodeDecodeError as e: # Encoding errors - create ValidationIssue - error = ValidationIssue( + error = ValidationIssue.create( "file_encoding_error", Path(filepath), 0, 0, - f"Could not decode file (encoding issue): {e}", + message=f"Could not decode file (encoding issue): {e}", severity='error' ) self.issues.append(error) self.log(f" Error decoding file: {e}") return None - except Exception as e: + except (MemoryError, RuntimeError, BufferError) as e: # Unexpected errors - create ValidationIssue - error = ValidationIssue( + error = ValidationIssue.create( "unexpected_error", Path(filepath), 0, 0, - f"Unexpected error reading file: {e}", + message=f"Unexpected error reading file: {e}", severity='error' ) self.issues.append(error) @@ -611,7 +258,7 @@ def _read_file_lines(self, filepath): return None def _process_heading_line( - self, filepath, line_num, line, stripped_line, heading_stack, in_code_block + self, filepath, line_num, line, *, stripped_line, heading_stack, in_code_block ): """ Process a single line to check if it's a heading and extract heading info. @@ -639,12 +286,12 @@ def _process_heading_line( if line != line.lstrip(): msg = ("Heading has leading whitespace. " "This is likely a linting error that should be fixed.") - warning = ValidationIssue( + warning = ValidationIssue.create( "heading_leading_whitespace", Path(filepath), line_num, line_num, - msg, + message=msg, severity='warning', heading=line.strip() ) @@ -660,12 +307,12 @@ def _process_heading_line( msg = (f"H{level} heading is too deeply nested. " "Consider restructuring the document to use " "H2-H5 only.") - warning = ValidationIssue( + warning = ValidationIssue.create( "heading_leading_whitespace", Path(filepath), line_num, line_num, - msg, + message=msg, severity='warning', heading=line.strip() ) @@ -678,21 +325,23 @@ def _process_heading_line( return None, in_code_block, True # Parse heading number - numbers, title = self.parse_heading_number(heading_text) + numbers, _title = self.parse_heading_number(heading_text) # If this heading is not numbered, create HeadingInfo for it if numbers is None: return self._create_unnumbered_heading( - filepath, line_num, level, heading_text, line, heading_stack + filepath, line_num, level, heading_text, + line=line, heading_stack=heading_stack ), in_code_block, True # Process numbered heading return self._create_numbered_heading( - filepath, line_num, level, heading_text, line, numbers, heading_stack + filepath, line_num, level, heading_text, + line=line, numbers=numbers, heading_stack=heading_stack ), in_code_block, False def _create_unnumbered_heading( - self, filepath, line_num, level, heading_text, line, heading_stack + self, filepath, line_num, level, heading_text, *, line, heading_stack ): """Create HeadingInfo for an unnumbered heading.""" # Find parent heading from current stack @@ -704,17 +353,13 @@ def _create_unnumbered_heading( # Create HeadingInfo with empty numbers list and "MISSING" as original_number heading_info = HeadingInfo( - filepath, line_num, level, [], heading_text, line.strip(), - parent=parent_heading, issue=None + filepath, line_num, level, [], + heading_text=heading_text, + full_line=line.strip(), parent=parent_heading, issue=None ) heading_info.original_number = "MISSING" heading_info.has_period = False - # Check for backticks in heading text - self.check_and_record_backticks( - filepath, line_num, heading_text, line.strip(), heading_info - ) - return heading_info def _check_h1_heading(self, filepath, line_num, heading_text, full_line): @@ -724,12 +369,12 @@ def _check_h1_heading(self, filepath, line_num, heading_text, full_line): self.h1_first_line[filepath] = line_num else: msg = "More than one H1 heading found. Only the first H1 heading is valid." - error = ValidationIssue( + error = ValidationIssue.create( "heading_multiple_h1", Path(filepath), line_num, line_num, - msg, + message=msg, severity='error', heading=full_line ) @@ -740,19 +385,19 @@ def _check_h1_heading(self, filepath, line_num, heading_text, full_line): numbers, _ = self.parse_heading_number(heading_text) if numbers is not None: msg = "H1 heading should not be numbered." - warning = ValidationIssue( + warning = ValidationIssue.create( "heading_h1_numbering", Path(filepath), line_num, line_num, - msg, + message=msg, severity='warning', heading=full_line ) self.issues.append(warning) def _create_numbered_heading( - self, filepath, line_num, level, heading_text, line, numbers, heading_stack + self, filepath, line_num, level, heading_text, *, line, numbers, heading_stack ): """Create HeadingInfo for a numbered heading.""" # Extract original number string from heading_text using regex @@ -793,121 +438,25 @@ def _create_numbered_heading( has_period = True heading_info = HeadingInfo( - filepath, line_num, level, numbers, title, line.strip(), - parent=parent_heading, issue=None + filepath, line_num, level, numbers, + heading_text=title, + full_line=line.strip(), parent=parent_heading, issue=None ) # Set original_number from the actual string extracted from the file heading_info.original_number = original_number_str heading_info.has_period = has_period # Track period for H2 headings - # Check for backticks in heading text (after number removed) - self.check_and_record_backticks( - filepath, line_num, title, line.strip(), heading_info - ) - return heading_info def _validate_heading_structure(self, filepath, headings, unnumbered_headings): - """ - Validate heading structure after parsing. - - Returns: - List of headings (may be modified with issues) - """ - # Check if we have any headings - if not headings: - return [] - - # Filter H2 headings once and reuse - h2_headings = [h for h in headings if h.level == 2] - if not h2_headings: - # No H2 headings - skip numbering validation but return headings for other checks - return headings - - # Check if the first H2 heading is numbered - first_h2 = min(h2_headings, key=lambda h: h.line_num) - first_h2_is_numbered = ( - first_h2.numbers and - len(first_h2.numbers) > 0 and - first_h2.original_number != "MISSING" + """Validate heading structure after parsing. Returns list of headings.""" + return _validate_heading_structure( + filepath, headings, unnumbered_headings, + issues=self.issues, + first_error_line=self.first_error_line, + log_fn=self.log ) - if first_h2_is_numbered and unnumbered_headings: - # First H2 is numbered, so unnumbered headings are errors - for line_num, level, heading_text, full_line, heading_info in unnumbered_headings: - msg = (f"H{level} heading is missing numbering. " - "This document uses numbered headings, so all headings must be numbered.") - error = ValidationIssue( - "heading_missing_numbering", - Path(filepath), - line_num, - line_num, - msg, - severity='error', - heading=full_line, - heading_info=heading_info - ) - self.issues.append(error) - heading_info.issue = error - if self.first_error_line[filepath] is None: - self.first_error_line[filepath] = line_num - - # Check if the first H2 heading is numbered - # Only perform numbering validation if the first H2 is numbered - is_first_h2_numbered = ( - first_h2.numbers and - len(first_h2.numbers) > 0 and - first_h2.original_number != "MISSING" - ) - - if not is_first_h2_numbered: - # First H2 is not numbered - skip numbering validation - # but still return headings for other checks (capitalization, duplicates, etc.) - return headings - - # First H2 is numbered - validate it must be "0. Title" or "1. Title" - if first_h2.numbers[0] not in [0, 1]: - error = ValidationIssue( - "heading_first_h2_numbering", - Path(filepath), - first_h2.line_num, - first_h2.line_num, - (f"First H2 heading must be numbered '0' or '1', " - f"got '{first_h2.numbers[0]}'. " - "Please run a markdown linter to fix basic heading order, " - "then re-run this script."), - severity='error', - heading=first_h2.full_line, - heading_info=first_h2 - ) - self.issues.append(error) - first_h2.issue = error - return headings # Continue with validation despite this error - - # Verify that only H2 headings have parent = None - for heading in headings: - if heading.level == 2 and heading.parent is not None: - # This shouldn't happen, but log it if it does - self.log(f" Warning: H2 heading at line {heading.line_num} has a parent") - elif heading.level > 2 and heading.parent is None: - # This is an error - H3+ should have a parent - error = ValidationIssue( - "heading_no_parent", - Path(filepath), - heading.line_num, - heading.line_num, - (f"H{heading.level} heading has no parent. " - "Please run a markdown linter to fix basic heading order, " - "then re-run this script."), - severity='error', - heading=heading.full_line, - heading_info=heading - ) - self.issues.append(error) - heading.issue = error - - return headings - def build_heading_structure(self, filepath): """ First pass: Build the complete heading structure. @@ -928,7 +477,10 @@ def build_heading_structure(self, filepath): stripped_line = line.strip() heading_info, in_code_block, should_continue = self._process_heading_line( - filepath, line_num, line, stripped_line, heading_stack, in_code_block + filepath, line_num, line, + stripped_line=stripped_line, + heading_stack=heading_stack, + in_code_block=in_code_block ) if should_continue: @@ -955,14 +507,44 @@ def build_heading_structure(self, filepath): heading_stack[heading_info.level] = heading_info # Clear deeper levels when we move up in hierarchy - levels_to_clear = [lvl for lvl in heading_stack.keys() if lvl > heading_info.level] + levels_to_clear = [lvl for lvl in heading_stack if lvl > heading_info.level] for lvl in levels_to_clear: del heading_stack[lvl] # Validate heading structure return self._validate_heading_structure(filepath, headings, unnumbered_headings) - def calculate_corrected_numbers(self, filepath, headings): + def _apply_h2_corrected_numbers(self, h2_headings: list) -> None: + """Set corrected_number on H2 headings (sorted by line_num).""" + start_number = 0 + if h2_headings: + first_h2_numbers = h2_headings[0].numbers + start_number = 0 if (first_h2_numbers and not first_h2_numbers[0]) else 1 + h2_sequence = start_number - 1 + for heading in h2_headings: + h2_sequence += 1 + heading.corrected_number = str(h2_sequence) + + def _apply_level_corrected_numbers(self, headings: list, level: int) -> None: + """Set corrected_number on headings at level (H3–H6) using parent sequences.""" + level_headings = [ + h for h in headings + if h.level == level and h.numbers and h.original_number != "MISSING" + ] + if not level_headings: + return + level_headings.sort(key=lambda h: h.line_num) + parent_sequences = {} + for heading in level_headings: + if heading.parent and heading.parent.corrected_number: + parent_corrected = heading.parent.corrected_number + parent_sequences[parent_corrected] = parent_sequences.get(parent_corrected, 0) + 1 + seq = parent_sequences[parent_corrected] + heading.corrected_number = f"{parent_corrected}.{seq}" + else: + heading.corrected_number = heading.original_number + + def calculate_corrected_numbers(self, _filepath, headings): """ Calculate corrected numbers for ALL headings level by level. This happens BEFORE validation - we determine what the numbers SHOULD be, @@ -970,73 +552,74 @@ def calculate_corrected_numbers(self, filepath, headings): """ if not headings: return - - # Filter numbered headings and numbered H2 headings in a single pass - numbered_headings = [] - h2_headings = [] - for h in headings: - if h.numbers and h.original_number != "MISSING": - numbered_headings.append(h) - if h.level == 2: - h2_headings.append(h) - + numbered_headings = [h for h in headings if h.numbers and h.original_number != "MISSING"] + h2_headings = [h for h in numbered_headings if h.level == 2] if not numbered_headings: - # No numbered headings - skip numbering calculation return - - # Process H2 headings first (sorted by line number) h2_headings.sort(key=lambda h: h.line_num) + self._apply_h2_corrected_numbers(h2_headings) + for level in range(3, 7): + self._apply_level_corrected_numbers(headings, level) - # Determine starting number from first H2 heading - # If first H2 is "0", start from 0; otherwise start from 1 - start_number = 0 - if h2_headings: - first_h2_numbers = h2_headings[0].numbers - if first_h2_numbers and first_h2_numbers[0] == 0: - start_number = 0 - else: - start_number = 1 - - h2_sequence = start_number - 1 # Will be incremented to start_number - for heading in h2_headings: - h2_sequence += 1 - # corrected_number is just the number (no period) - heading.corrected_number = str(h2_sequence) - - # Process H3+ headings level by level - # Only process numbered headings - for level in range(3, 7): # H3 through H6 - level_headings = [ - h for h in headings - if h.level == level and h.numbers and h.original_number != "MISSING" - ] - if not level_headings: - continue - - level_headings.sort(key=lambda h: h.line_num) - - # Track sequence for each parent (by parent's corrected_number) - parent_sequences = {} # Maps parent corrected_number -> current sequence - - for heading in level_headings: - # Get parent's corrected number - if heading.parent and heading.parent.corrected_number: - parent_corrected = heading.parent.corrected_number - - # Initialize sequence for this parent if needed - if parent_corrected not in parent_sequences: - parent_sequences[parent_corrected] = 0 - - # Increment sequence for this parent - parent_sequences[parent_corrected] += 1 - sequence_num = parent_sequences[parent_corrected] + def _find_prev_heading_same_level_parent(self, headings_by_line, heading): + """Return previous heading at same level with same parent, or None.""" + if heading.level <= 2 or not heading.parent: + return None + for h in headings_by_line: + if h.line_num >= heading.line_num: + break + if h.level == heading.level and h.parent == heading.parent: + return h + return None + + def _record_heading_issue(self, filepath, heading, error): + """Append issue, set heading.issue, and update first_error_line.""" + self.issues.append(error) + heading.issue = error + if self.first_error_line[filepath] is None: + self.first_error_line[filepath] = heading.line_num + + def _record_depth_mismatch_if_needed( + self, filepath, heading, level, *, expected_depth, actual_depth + ): + """If depth mismatch, record error and return True; else return False.""" + if actual_depth == expected_depth: + return False + error = ValidationIssue.create( + "heading_depth_mismatch", + Path(filepath), + heading.line_num, + heading.line_num, + message=( + f"H{level} heading has {actual_depth} number(s), " + f"expected {expected_depth}" + ), + severity='error', + heading=heading.full_line, + heading_info=heading + ) + self._record_heading_issue(filepath, heading, error) + return True - # Build corrected number: parent.corrected_number.sequence_num - heading.corrected_number = f"{parent_corrected}.{sequence_num}" - else: - # No parent or parent doesn't have corrected number yet - # This shouldn't happen if structure is correct, but handle gracefully - heading.corrected_number = heading.original_number + def _record_no_parent_if_needed(self, filepath, heading, level): + """If H3+ has no parent, record error and return True; else return False.""" + if level <= 2 or heading.parent is not None: + return False + error = ValidationIssue.create( + "heading_no_parent_in_validation", + Path(filepath), + heading.line_num, + heading.line_num, + message=( + f"H{level} heading has no parent. " + "Please run a markdown linter to fix basic heading order." + ), + severity='error', + heading=heading.full_line, + heading_info=heading + ) + self._record_heading_issue(filepath, heading, error) + return True def validate_numbering(self, filepath, headings): """ @@ -1056,63 +639,21 @@ def validate_numbering(self, filepath, headings): level = heading.level numbers = heading.numbers - - # Expected depth based on heading level - # H2 => 1 number, H3 => 2 numbers, H4 => 3 numbers, etc. expected_depth = level - 1 actual_depth = len(numbers) - # Check depth matches heading level - if actual_depth != expected_depth: - error = ValidationIssue( - "heading_depth_mismatch", - Path(filepath), - heading.line_num, - heading.line_num, - f"H{level} heading has {actual_depth} number(s), " - f"expected {expected_depth}", - severity='error', - heading=heading.full_line, - heading_info=heading - ) - self.issues.append(error) - heading.issue = error - if self.first_error_line[filepath] is None: - self.first_error_line[filepath] = heading.line_num + if self._record_depth_mismatch_if_needed( + filepath, heading, level, + expected_depth=expected_depth, actual_depth=actual_depth + ): + continue + if self._record_no_parent_if_needed(filepath, heading, level): continue - # Check parent relationship - if level > 2: # H3 and beyond need to match parent - if heading.parent is None: - error = ValidationIssue( - "heading_no_parent_in_validation", - Path(filepath), - heading.line_num, - heading.line_num, - f"H{level} heading has no parent. " - "Please run a markdown linter to fix basic heading order.", - severity='error', - heading=heading.full_line, - heading_info=heading - ) - self.issues.append(error) - heading.issue = error - if self.first_error_line[filepath] is None: - self.first_error_line[filepath] = heading.line_num - continue - - # Check if original_number differs from corrected_number if heading.original_number != heading.corrected_number: - # Find previous heading at same level with same parent for better error message - prev_heading = None - if level > 2 and heading.parent: - for h in headings_by_line: - if h.line_num < heading.line_num and h.level == level: - if h.parent == heading.parent: - prev_heading = h - break - - # Build error message + prev_heading = self._find_prev_heading_same_level_parent( + headings_by_line, heading + ) if prev_heading: msg = (f"Non-sequential numbering: got '{heading.original_number}', " f"expected '{heading.corrected_number}' " @@ -1121,365 +662,66 @@ def validate_numbering(self, filepath, headings): msg = (f"Non-sequential numbering: got '{heading.original_number}', " f"expected '{heading.corrected_number}'") - error = ValidationIssue( + error = ValidationIssue.create( "heading_non_sequential", Path(filepath), heading.line_num, heading.line_num, - msg, + message=msg, severity='error', heading=heading.full_line, heading_info=heading ) - self.issues.append(error) - heading.issue = error - if self.first_error_line[filepath] is None: - self.first_error_line[filepath] = heading.line_num + self._record_heading_issue(filepath, heading, error) def check_excessive_numbering(self, filepath, headings): - """ - Check for H3+ headings where the depth-specific number in the - corrected numbering exceeds 20. This check is performed AFTER - corrected heading numbering is calculated. - - For H3 headings (e.g., "1.2"), checks the second number (index 1). - For H4 headings (e.g., "1.2.3"), checks the third number (index 2). - For H5 headings (e.g., "1.2.3.4"), checks the fourth number (index 3). - And so on. - """ - if not headings: - return - - # Check only H3+ headings (level >= 3) - h3_plus_headings = [h for h in headings if h.level >= 3] - if not h3_plus_headings: - return - - for heading in h3_plus_headings: - # Skip if corrected_number is not set - if not heading.corrected_number: - continue - - # Parse corrected_number string (e.g., "1.2.3" or "3.25") - try: - number_segments = [int(n) for n in heading.corrected_number.split('.')] - except (ValueError, AttributeError): - # If parsing fails, skip this heading - continue - - # For heading level L, check the segment at index (L - 2) - # H3 (level 3) has 2 numbers, check index 1 (second number) - # H4 (level 4) has 3 numbers, check index 2 (third number) - # H5 (level 5) has 4 numbers, check index 3 (fourth number) - segment_index = heading.level - 2 - - # Ensure we have enough segments - if segment_index >= len(number_segments): - continue - - # Check if the depth-specific segment exceeds maximum - segment = number_segments[segment_index] - if segment > MAX_HEADING_NUMBER_SEGMENT: - msg = ( - f"H{heading.level} heading has numbering " - f"'{heading.corrected_number}' where number {segment} " - f"(at depth {heading.level - 1}) exceeds " - f"{MAX_HEADING_NUMBER_SEGMENT}. " - "Consider restructuring the document to reduce nesting depth." - ) - warning = ValidationIssue( - "heading_excessive_numbering", - Path(filepath), - heading.line_num, - heading.line_num, - msg, - severity='warning', - heading=heading.full_line, - heading_info=heading - ) - self.issues.append(warning) + """Delegate to heading_numbering checks module.""" + _check_excessive_numbering(self.issues, filepath, headings) def check_single_word_headings(self, filepath, headings): - """ - Check for H4+ headings where the title (after numbering) is a single word. - This is a warning as single-word headings may be too vague or unclear. - """ - if not headings: - return - - # Check only H4+ headings (level >= 4) - h4_plus_headings = [h for h in headings if h.level >= 4] - if not h4_plus_headings: - return - - for heading in h4_plus_headings: - # heading_text contains the title after the number has been removed - if not heading.heading_text: - continue - - # Strip whitespace and check if it's a single word (no spaces) - title = heading.heading_text.strip() - if title and ' ' not in title: - msg = (f"H{heading.level} heading has a single-word title '{title}'. " - "Consider using a more descriptive multi-word heading.") - warning = ValidationIssue( - "heading_single_word", - Path(filepath), - heading.line_num, - heading.line_num, - msg, - severity='warning', - heading=heading.full_line, - heading_info=heading - ) - self.issues.append(warning) + """Delegate to heading_numbering checks module.""" + _check_single_word_headings(self.issues, filepath, headings) def check_duplicate_headings(self, filepath, headings): - """ - Check for duplicate headings (excluding numbering) across all levels. - All occurrences after the first are flagged as errors. - """ - if not headings: - return - - # Group headings by their title text (case-insensitive, normalized) - # heading_text contains the title after the number has been removed - heading_groups = defaultdict(list) - for heading in headings: - if not heading.heading_text: - continue - # Normalize: strip whitespace and convert to lowercase for comparison - normalized_title = heading.heading_text.strip().lower() - if normalized_title: - heading_groups[normalized_title].append(heading) - - # Find duplicates and flag all but the first occurrence as errors - for normalized_title, heading_list in heading_groups.items(): - if len(heading_list) > 1: - # Sort by line number to ensure first occurrence is the earliest - heading_list.sort(key=lambda h: h.line_num) - - # Flag all subsequent occurrences as errors - for duplicate_heading in heading_list[1:]: - # Find all other occurrences for the error message - other_locations = [ - f"line {h.line_num}" for h in heading_list - if h.line_num != duplicate_heading.line_num - ] - other_locations_str = ", ".join(other_locations) - - msg = (f"Duplicate heading title '{duplicate_heading.heading_text}' " - f"(also appears at {other_locations_str}). " - "Each heading should have a unique title.") - error = ValidationIssue( - "heading_duplicate", - Path(filepath), - duplicate_heading.line_num, - duplicate_heading.line_num, - msg, - severity='error', - heading=duplicate_heading.full_line, - heading_info=duplicate_heading - ) - self.issues.append(error) - if duplicate_heading.issue is None: - duplicate_heading.issue = error - if self.first_error_line[filepath] is None: - self.first_error_line[filepath] = duplicate_heading.line_num + """Delegate to heading_numbering checks module.""" + _check_duplicate_headings( + self.issues, self.first_error_line, filepath, headings + ) def is_go_code_related_heading(self, heading_text): - """ - Check if a heading is related to Go code elements. - - Returns True if the heading appears to reference a Go code element - (struct, function, method, interface, type) based on patterns that - match what the Go code blocks validator expects. - - This allows headings to use actual Go identifiers (like "readOnlyPackage") - instead of Title Case, which is required by the Go code blocks validator. - """ - if not heading_text: - return False - - # Patterns that indicate Go code elements: - # 1. camelCase identifiers (starts with lowercase, has uppercase in middle) - # Examples: "readOnlyPackage", "filePackage", "newFileEntry" - camel_case_pattern = r'\b[a-z][a-zA-Z]*[A-Z][a-zA-Z]*\b' - if re.search(camel_case_pattern, heading_text): - return True - - # 2. Method patterns: "Type.MethodName" (camelCase.TypeMethod) - method_pattern = r'\b[a-z][a-zA-Z]*\.[A-Z][a-zA-Z]*\b' - if re.search(method_pattern, heading_text): - return True - - # 3. Headings that end with Go kind words followed by actual identifiers - # Examples: "readOnlyPackage Struct", "filePackage Struct", "newFileEntry Function" - go_kind_words = ['Struct', 'Function', 'Method', 'Interface', 'Type'] - for kind_word in go_kind_words: - # Check if heading ends with "KindWord" and has camelCase before it - pattern = rf'\b[a-z][a-zA-Z]*\s+{kind_word}\b' - if re.search(pattern, heading_text): - return True - # Also check for "Type.MethodName Method" pattern - method_kind_pattern = rf'\b[a-z][a-zA-Z]*\.[A-Z][a-zA-Z]*\s+{kind_word}\b' - if re.search(method_kind_pattern, heading_text): - return True - - return False + """Return True if heading appears to reference a Go code element.""" + return _is_go_code_related_heading(heading_text) def check_heading_capitalization(self, filepath, headings): - """ - Check if headings follow Title Case capitalization. - Adds warnings for headings with incorrect capitalization. - - Skips capitalization checks for headings that reference Go code elements, - as those must use actual Go identifiers (not Title Case) to satisfy - the Go code blocks validator. - """ - if not headings: - return - - for heading in headings: - if not heading.heading_text: - continue - - # Skip capitalization check for Go code-related headings - # These headings must use actual Go identifiers (e.g., "readOnlyPackage Struct") - # instead of Title Case (e.g., "ReadOnlyPackage Struct") to satisfy - # the Go code blocks validator requirements - if self.is_go_code_related_heading(heading.heading_text): - continue - - is_correct, corrected = self.check_capitalization(heading.heading_text) - if not is_correct: - heading.corrected_capitalization = corrected - msg = (f"Incorrect capitalization: got '{heading.heading_text}', " - f"expected '{corrected}'") - warning = ValidationIssue( - "heading_capitalization", - Path(filepath), - heading.line_num, - heading.line_num, - msg, - severity='warning', - heading=heading.full_line, - heading_info=heading - ) - self.issues.append(warning) + """Delegate to heading_numbering checks module.""" + _check_heading_capitalization(self.issues, filepath, headings) def check_organizational_headings(self, filepath, headings): - """ - Check for organizational headings with no content. - Uses shared utility function is_organizational_heading. - Errors on headings that are purely organizational with no content. - """ - if not headings: - return - - # Read file content + """Delegate to heading_numbering checks module.""" try: with open(filepath, 'r', encoding='utf-8') as f: content = f.read() except (IOError, OSError) as e: - # File read errors - log and return self.log(f" Error reading file for organizational check: {e}") return except UnicodeDecodeError as e: - # Encoding errors - log and return self.log(f" Error decoding file for organizational check (encoding issue): {e}") return - except Exception as e: - # Unexpected errors - log and return + except (MemoryError, RuntimeError, BufferError) as e: self.log(f" Unexpected error reading file for organizational check: {e}") return - - # Build heading hierarchy using shared utility - # Convert HeadingInfo objects to (line_num, level, text) tuples - headings_for_hierarchy = [ - (h.line_num, h.level, h.heading_text) - for h in headings - ] - # Sort by line number - headings_for_hierarchy.sort(key=lambda x: x[0]) - hierarchy = build_heading_hierarchy(headings_for_hierarchy) - - # Check each heading - for heading in headings: - if heading.issue: # Skip headings that already have errors - continue - - # Check if heading is organizational - try: - result = is_organizational_heading( - content, - heading.line_num, - heading.level, - headings_for_hierarchy, - hierarchy, - max_prose_lines=MAX_ORGANIZATIONAL_PROSE_LINES - ) - - # Error on organizational headings with no content - if result.get('is_organizational') and result.get('is_empty'): - msg = ("Organizational heading with no content. " - "Headings should have substantive content or be removed.") - error = ValidationIssue( - "organizational_heading", - Path(filepath), - heading.line_num, - heading.line_num, - msg, - severity='error', - heading=heading.full_line, - heading_info=heading - ) - self.issues.append(error) - heading.issue = error - if self.first_error_line[filepath] is None: - self.first_error_line[filepath] = heading.line_num - except (ValueError, IndexError, KeyError) as e: - # Data structure errors - log but don't fail - self.log(f" Error checking organizational heading at line {heading.line_num}: {e}") - except Exception as e: - # Unexpected errors - log but don't fail - self.log( - f" Unexpected error checking organizational heading at line " - f"{heading.line_num}: {e}" - ) + _check_organizational_headings( + self.issues, + self.first_error_line, + filepath, + headings, + content, + log_fn=self.log, + ) def check_h2_period_consistency(self, filepath, headings): - """ - Check if H2 headings have consistent period usage. - If first H2 has period, all should have period. If not, none should. - Only called when there are no errors. - """ - h2_headings = [h for h in headings if h.level == 2] - if not h2_headings: - return - - h2_headings.sort(key=lambda h: h.line_num) - first_h2 = h2_headings[0] - expected_has_period = first_h2.has_period - - for heading in h2_headings[1:]: # Skip first one - if heading.has_period != expected_has_period: - expected_str = "with period" if expected_has_period else "without period" - actual_str = "with period" if heading.has_period else "without period" - msg = (f"H2 heading period inconsistency: first H2 is {expected_str}, " - f"but this heading is {actual_str}. " - f"All H2 headings should match the first one.") - warning = ValidationIssue( - "heading_period_inconsistency", - Path(filepath), - heading.line_num, - heading.line_num, - msg, - severity='warning', - heading=heading.full_line, - heading_info=heading - ) - self.issues.append(warning) + """Delegate to heading_numbering checks module.""" + _check_h2_period_consistency(self.issues, filepath, headings) def validate_file(self, filepath): """Validate heading numbering in a single markdown file.""" @@ -1575,467 +817,16 @@ def validate_all(self, root_dir, output, target_paths=None): def print_summary(self, output_builder): """Print validation summary using OutputBuilder.""" - # Filter issues by severity in a single loop - errors = [] - warnings = [] - for issue in self.issues: - if issue.matches(severity='error'): - errors.append(issue) - if issue.matches(severity='warning'): - warnings.append(issue) - - # Show errors first - if errors: - output_builder.add_errors_header() - output_builder.add_line( - f"Found {len(errors)} error(s):", - section="error" - ) - output_builder.add_blank_line("error") - - # Group errors by issue type (for counting) - errors_by_type = defaultdict(list) - for error in errors: - errors_by_type[error.issue_type].append(error) - - # Group errors by (file, line_num) to combine multiple errors on same line - errors_by_line = defaultdict(list) - for error in errors: - key = (error.file, error.start_line) - errors_by_line[key].append(error) - - # Display organizational heading errors first - org_errors = errors_by_type.get("organizational_heading", []) - if org_errors: - output_builder.add_line( - f"Organizational Heading Errors ({len(org_errors)}):", - section="error" - ) - output_builder.add_blank_line("error") - - # Get organizational errors grouped by line - org_errors_by_line = { - k: v for k, v in errors_by_line.items() - if any( - isinstance(e, ValidationIssue) and - e.issue_type == "organizational_heading" - for e in v - ) - } - - # Sort by file and line number - sorted_org_lines = sorted(org_errors_by_line.keys(), key=lambda k: (k[0], k[1])) - - for file, line_num in sorted_org_lines: - line_errors = [ - e for e in org_errors_by_line[(file, line_num)] - if isinstance(e, ValidationIssue) and - e.issue_type == "organizational_heading" - ] - if not line_errors: - continue - - rel_file = self.rel_path(file) - # Combine all error messages for this line - messages = [ - e.message if isinstance(e, ValidationIssue) - else e.get('message', '') - for e in line_errors - ] - combined_message = "; ".join(messages) - - # Get suggestion from first error with heading_info - suggestion = None - for error in line_errors: - heading_info = error.extra_fields.get('heading_info') - if heading_info: - suggestion = self.build_corrected_full_line(heading_info) - break - - error_msg = format_issue_message( - "error", - "Organizational heading", - rel_file, - line_num, - combined_message, - suggestion, - self.no_color - ) - output_builder.add_error_line(error_msg) - - output_builder.add_blank_line("error") - - # Display heading formatting errors (e.g., backticks) - formatting_errors = errors_by_type.get("heading_formatting", []) - if formatting_errors: - if org_errors: - output_builder.add_separator(section="error") - output_builder.add_blank_line("error") - - output_builder.add_line( - f"Heading Formatting Errors ({len(formatting_errors)}):", - section="error" - ) - output_builder.add_blank_line("error") - - # Get formatting errors grouped by line - formatting_errors_by_line = { - k: v for k, v in errors_by_line.items() - if any( - isinstance(e, ValidationIssue) and - e.issue_type == "heading_formatting" - for e in v - ) - } - - # Sort by file and line number - sorted_formatting_lines = sorted( - formatting_errors_by_line.keys(), key=lambda k: (k[0], k[1]) - ) - - for file, line_num in sorted_formatting_lines: - line_errors = [ - e for e in formatting_errors_by_line[(file, line_num)] - if isinstance(e, ValidationIssue) and - e.issue_type == "heading_formatting" - ] - if not line_errors: - continue - - rel_file = self.rel_path(file) - # Combine all error messages for this line - messages = [ - e.message if isinstance(e, ValidationIssue) - else e.get('message', '') - for e in line_errors - ] - combined_message = "; ".join(messages) - - # Get suggestion from first error with heading_info - suggestion = None - for error in line_errors: - heading_info = error.extra_fields.get('heading_info') - if heading_info: - suggestion = self.build_corrected_full_line(heading_info) - break - - error_msg = format_issue_message( - "error", - "Heading formatting", - rel_file, - line_num, - combined_message, - suggestion, - self.no_color - ) - output_builder.add_error_line(error_msg) - - output_builder.add_blank_line("error") - - # Display heading numbering errors - # Get all numbering-related errors - # (everything except organizational and formatting) - numbering_errors = [ - e for e in errors - if isinstance(e, ValidationIssue) and - e.issue_type not in ( - "organizational_heading", "heading_formatting" - ) - ] - if numbering_errors: - if org_errors or formatting_errors: - output_builder.add_separator(section="error") - output_builder.add_blank_line("error") - - output_builder.add_line( - f"Heading Numbering Errors ({len(numbering_errors)}):", - section="error" - ) - output_builder.add_blank_line("error") - - # Get numbering errors grouped by line - numbering_errors_by_line = { - k: v for k, v in errors_by_line.items() - if any( - isinstance(e, ValidationIssue) and - e.issue_type not in ( - "organizational_heading", "heading_formatting" - ) - for e in v - ) - } - - # Sort by file and line number - sorted_numbering_lines = sorted( - numbering_errors_by_line.keys(), key=lambda k: (k[0], k[1]) - ) - - for file, line_num in sorted_numbering_lines: - # All errors in numbering_errors_by_line are already numbering errors - # (filtered to exclude organizational_heading and heading_formatting) - line_errors = numbering_errors_by_line[(file, line_num)] - if not line_errors: - continue - - rel_file = self.rel_path(file) - # Combine all error messages for this line - messages = [ - e.message if isinstance(e, ValidationIssue) - else e.get('message', '') - for e in line_errors - ] - combined_message = "; ".join(messages) - - # Get suggestion from first error with heading_info - suggestion = None - for error in line_errors: - heading_info = error.extra_fields.get('heading_info') - if heading_info: - suggestion = self.build_corrected_full_line(heading_info) - break - - error_msg = format_issue_message( - "error", - "Heading numbering", - rel_file, - line_num, - combined_message, - suggestion, - self.no_color - ) - output_builder.add_error_line(error_msg) - - output_builder.add_blank_line("error") - - # Show sorted headings from first error for each file - # (only for numbering errors, not organizational) - if numbering_errors: - for filepath in sorted(self.headings_from_first_error.keys()): - errored_headings = self.headings_from_first_error[filepath] - if not errored_headings: - continue - - # Filter out headings that only have duplicate errors (no numbering errors) - # Only include headings that have numbering errors (original != corrected) - # Also include unnumbered headings (original_number == "MISSING") - headings_with_numbering_errors = [] - for heading in errored_headings: - # Check if this heading has a numbering error - # Include if original_number is "MISSING" (unnumbered heading) - if heading.original_number == "MISSING" and heading.corrected_number: - headings_with_numbering_errors.append(heading) - elif heading.original_number and heading.corrected_number: - current_for_comparison = heading.original_number.rstrip('.') - correct_for_comparison = heading.corrected_number.rstrip('.') - has_numbering_error = (current_for_comparison != correct_for_comparison) - if has_numbering_error: - headings_with_numbering_errors.append(heading) - - # Skip this section if there are no numbering errors (only duplicate errors) - if not headings_with_numbering_errors: - continue - - first_error_line = self.first_error_line[filepath] - rel_file = self.rel_path(filepath) - - # CRITICAL: This format must be preserved for - # apply_heading_corrections.py parsing - output_builder.add_separator(section="error") - output_builder.add_line( - f"Sorted headings from first error (line {first_error_line}) " - f"in {rel_file}:", - section="error" - ) - output_builder.add_separator(section="error") - output_builder.add_blank_line("error") - output_builder.add_line( - "The following headings should be in this order " - "(sorted by numeric values):", - section="error" - ) - output_builder.add_blank_line("error") - output_builder.add_line( - "Format: Line X: [CURRENT] -> [CORRECT] Title", - section="error" - ) - output_builder.add_blank_line("error") - - # Sort headings with numbering errors for display: first by line number - # (document order), then by numeric values - sorted_headings = sorted( - headings_with_numbering_errors, - key=lambda h: (h.line_num, h.sort_key()) - ) - - # Display in sorted order with correct numbering - # Find max line number for alignment - max_line_num = ( - max(h.line_num for h in sorted_headings) - if sorted_headings else 0 - ) - line_num_width = len(str(max_line_num)) - - # Determine period pattern from first H2 heading for display - h2_headings_in_output = [h for h in sorted_headings if h.level == 2] - display_period = False - if h2_headings_in_output: - first_h2 = min(h2_headings_in_output, key=lambda h: h.line_num) - display_period = first_h2.has_period - - for heading in sorted_headings: - current_number_str = heading.original_number - # corrected_number should always be set after - # calculate_corrected_numbers - # If it's not set, that's a bug, but use original_number - # as fallback for safety - if heading.corrected_number is None: - correct_number_str = current_number_str - else: - correct_number_str = heading.corrected_number - - # For unnumbered headings, display "MISSING" as current - if current_number_str == "MISSING": - current_display = "MISSING" - # For display, add period to H2 headings if first H2 has period - elif heading.level == 2 and display_period: - current_display = f"{current_number_str}." - else: - current_display = current_number_str - - # For comparison, strip periods from both - current_for_comparison = current_number_str.rstrip('.') - correct_for_comparison = correct_number_str.rstrip('.') - - # Determine if numbering needs to change - needs_change = (current_for_comparison != correct_for_comparison) - - # Check if this is a duplicate heading error (not a numbering error) - is_duplicate_error = ( - heading.issue and - isinstance(heading.issue, ValidationIssue) and - heading.issue.matches(issue_type="heading_duplicate") - ) - - # For display, add period to H2 headings if first H2 has period - if heading.level == 2 and display_period: - correct_display = f"{correct_number_str}." - else: - correct_display = correct_number_str - - # Get heading text with capitalization correction if available - heading_text_display = heading.heading_text - if heading.corrected_capitalization: - heading_text_display = heading.corrected_capitalization - - # Use different format for duplicate errors when - # numbering is correct - # This format won't match the correction pattern in - # apply_heading_corrections.py - if is_duplicate_error and not needs_change: - output_builder.add_error_line( - f"Line {heading.line_num:{line_num_width}d}: " - f"{'#' * heading.level} [{current_display}] (DUPLICATE) " - f"{heading_text_display}" - ) - else: - # Standard correction format for numbering errors - # Include capitalization correction in heading text if available - output_builder.add_error_line( - f"Line {heading.line_num:{line_num_width}d}: " - f"{'#' * heading.level} [{current_display}] -> " - f"[{correct_display}] {heading_text_display}" - ) - - output_builder.add_blank_line("error") - - # Show warnings - if warnings: - output_builder.add_warnings_header() - output_builder.add_line( - f"Found {len(warnings)} warning(s):", - section="warning" - ) - output_builder.add_blank_line("warning") - - # Group warnings by (file, line_num) to combine multiple warnings on same line - warnings_by_line = defaultdict(list) - for warning in warnings: - # warning is a ValidationIssue - if isinstance(warning, ValidationIssue): - file_key = warning.file - line_key = warning.start_line - else: - file_key = warning.get('file', '') - line_key = warning.get('line_num', 0) - key = (file_key, line_key) - warnings_by_line[key].append(warning) - - # Sort by file and line number for consistent output - sorted_warning_lines = sorted(warnings_by_line.keys(), key=lambda k: (k[0], k[1])) - - for file, line_num in sorted_warning_lines: - line_warnings = warnings_by_line[(file, line_num)] - rel_file = self.rel_path(file) - - # Combine all warning messages for this line - messages = [] - for warning in line_warnings: - # warning is a ValidationIssue - if isinstance(warning, ValidationIssue): - message = warning.message - else: - message = warning.get('message', '') - # Extract expected value from message if it contains "expected" - # Remove the "expected" part from message, keeping "got 'X'" - if "expected" in message.lower(): - message = re.sub( - r",\s*expected\s+['\"][^'\"]+['\"]", "", message - ) - messages.append(message) - - combined_message = "; ".join(messages) - - # Get suggestion from first warning with heading_info - suggestion = None - for warning in line_warnings: - if isinstance(warning, ValidationIssue): - heading_info = warning.extra_fields.get('heading_info') - else: - heading_info = getattr(warning, 'heading_info', None) - if heading_info: - suggestion = self.build_corrected_full_line(heading_info) - break - - warning_msg = format_issue_message( - "warning", - "Heading numbering", - rel_file, - line_num, - combined_message, - suggestion, - self.no_color - ) - output_builder.add_warning_line(warning_msg) - - # Overall status - if not errors and not warnings: - # Add summary section with statistics - files_checked = len(self.all_headings) - total_headings = sum(len(headings) for headings in self.all_headings.values()) - summary_items = [ - ("Files checked:", files_checked), - ("Headings checked:", total_headings), - ] - output_builder.add_summary_header() - output_builder.add_summary_section(summary_items) - output_builder.add_success_message("All heading numbering is valid!") - elif not errors: - output_builder.add_line( - "No heading numbering errors found (only warnings).", - section="final_message" - ) - else: - output_builder.add_failure_message("Validation failed. Please fix the errors above.") + _print_summary_report( + self.issues, + self.all_headings, + self.headings_from_first_error, + self.first_error_line, + rel_path_fn=self.rel_path, + build_corrected_full_line_fn=self.build_corrected_full_line, + no_color=self.no_color, + output_builder=output_builder + ) def show_help(): @@ -2045,8 +836,6 @@ def show_help(): def main(): """Main entry point.""" - import argparse - parser = argparse.ArgumentParser( description='Validate markdown heading numbering consistency', add_help=False diff --git a/scripts/validate_links.py b/scripts/validate_links.py index b458b649..02b7bceb 100644 --- a/scripts/validate_links.py +++ b/scripts/validate_links.py @@ -7,7 +7,6 @@ Usage: python3 validate_links.py [options] - Options: --verbose, -v Show detailed progress information --output, -o FILE Write detailed output to FILE @@ -17,17 +16,13 @@ path or comma-separated list of paths --nocolor, --no-color Disable colored output --help, -h Show this help message - Examples: # Basic validation python3 tmp/validate_links.py - # Save output to file python3 tmp/validate_links.py --output tmp/validation_report.txt - # Check requirements coverage python3 tmp/validate_links.py --check-coverage - # Verbose with coverage check python3 tmp/validate_links.py --verbose --check-coverage @@ -41,295 +36,21 @@ python3 tmp/validate_links.py --path docs/requirements,docs/tech_specs """ -import re import os import sys from pathlib import Path from collections import defaultdict from dataclasses import dataclass -from typing import List - -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" - -# Import shared utilities -for module_path in (str(scripts_dir), str(lib_dir)): - if module_path not in sys.path: - sys.path.insert(0, module_path) +from typing import Dict, List, Optional, Tuple -# Import shared utilities -from lib._validation_utils import ( # noqa: E402 +from lib._validation_utils import ( OutputBuilder, parse_no_color_flag, format_issue_message, parse_paths, ValidationIssue, find_markdown_files, is_safe_path, validate_file_name, validate_anchor, - extract_headings_with_anchors, FileContentCache + FileContentCache ) -from lib._link_extraction import extract_links # noqa: E402 - - -# Compiled regex patterns for performance (module level) -_RE_MARKDOWN_FORMAT = re.compile(r'[*_`]') -_RE_SPECIAL_CHARS = re.compile(r'[^a-zA-Z0-9 .-]') -_RE_SPLIT_WORDS = re.compile(r'[\s.\-]+') -_RE_CAMEL_CASE = re.compile(r'([a-z])([A-Z])') -_RE_NUMBERING_PREFIX = re.compile(r'^([0-9]+(?:\.[0-9]+)*)\.?\s+(.+)$') - - -# extract_headings() now uses shared utility extract_headings_with_anchors() -def extract_headings(file_path, file_cache=None): - """ - Extract all headings from a markdown file and generate anchors. - - Args: - file_path: Path to the file - file_cache: Optional FileContentCache instance to use for reading files - - Returns: - dict: Mapping of anchor -> (heading_text, heading_level, line_number) - """ - return extract_headings_with_anchors(Path(file_path), file_cache=file_cache) - - -def normalize_text_for_matching(text: str) -> str: - """ - Normalize text for matching by removing markdown formatting, - converting to lowercase, and removing common words. - - Args: - text: Text to normalize - - Returns: - Normalized text string - """ - # Remove markdown formatting - text = _RE_MARKDOWN_FORMAT.sub('', text) - # Convert to lowercase - text = text.lower() - # Remove special characters except spaces, hyphens, and dots - text = _RE_SPECIAL_CHARS.sub('', text) - return text.strip() - - -def extract_words(text: str) -> List[str]: - """ - Extract words from text, handling various separators. - - Handles: - - Spaces: "Add File" -> ["add", "file"] - - Dots: "Package.AddFile" -> ["package", "add", "file"] - - Hyphens: "add-file" -> ["add", "file"] - - CamelCase: "AddFile" -> ["add", "file"] - - Mixed: "Package.AddFile" -> ["package", "add", "file"] - - Args: - text: Text to extract words from - - Returns: - List of normalized words (all lowercase) - """ - # Remove markdown formatting first (preserve case for camelCase detection) - text = _RE_MARKDOWN_FORMAT.sub('', text) - - # Split on spaces, dots, and hyphens (preserve case) - words = _RE_SPLIT_WORDS.split(text) - - # Further split camelCase words and normalize to lowercase - all_words = [] - for word in words: - if not word: - continue - # Split camelCase: "AddFile" -> ["Add", "File"], then lowercase - camel_split = _RE_CAMEL_CASE.sub(r'\1 \2', word) - camel_words = camel_split.split() - # Convert to lowercase and add - all_words.extend([w.lower() for w in camel_words if w]) - - # Filter out empty strings and common stop words - stop_words = {'the', 'a', 'an', 'and', 'or', 'of', 'in', 'on', 'at', 'to', 'for', 'with', 'by'} - return [w for w in all_words if w and w not in stop_words] - - -def strip_numbering_prefix(text: str) -> str: - """ - Strip numbering prefix from heading text (e.g., "1.2.3 Add File" -> "Add File"). - - Args: - text: Heading text that may contain numbering - - Returns: - Text with numbering prefix removed - """ - # Match pattern like "1.2.3 " or "1 " at the start - match = _RE_NUMBERING_PREFIX.match(text) - if match: - return match.group(2).strip() - return text - - -def calculate_word_match_score(link_words: List[str], heading_words: List[str]) -> float: - """ - Calculate word matching score between link text and heading text. - - Args: - link_words: List of words from link text - heading_words: List of words from heading text - - Returns: - Score from 0-100 based on word matching - """ - if not link_words or not heading_words: - return 0.0 - - link_set = set(link_words) - heading_set = set(heading_words) - - # Exact set match - if link_set == heading_set: - return 100.0 - - # All link words in heading - if link_set.issubset(heading_set): - return 90.0 - - # Most link words in heading - matching_words = link_set.intersection(heading_set) - if matching_words: - match_ratio = len(matching_words) / len(link_set) - return 60.0 + (match_ratio * 30.0) # 60-90 range - - # Partial word matches (substring matching) - partial_matches = 0 - for link_word in link_words: - for heading_word in heading_words: - if link_word in heading_word or heading_word in link_word: - partial_matches += 1 - break - - if partial_matches > 0: - partial_ratio = partial_matches / len(link_words) - return 20.0 + (partial_ratio * 40.0) # 20-60 range - - return 0.0 - - -def suggest_anchor(link_text, broken_anchor, target_file, heading_cache, verbose=False): - """ - Suggest correct anchor based on weighted heuristics. - - Args: - link_text: Text from the markdown link - broken_anchor: The broken anchor that was not found - target_file: Path to the target file - heading_cache: Dictionary mapping file paths to heading dictionaries - (anchor -> (heading_text, heading_level, line_number)) - verbose: If True, return detailed scoring information - - Returns: - Tuple of (suggested_anchor, confidence_score) or None if no good match found. - If verbose=True, returns (suggested_anchor, confidence_score, score_details) - """ - # Get headings for target file - headings_dict = heading_cache.get(str(target_file), {}) - if not headings_dict: - return None - - # Normalize link text - normalized_link_text = normalize_text_for_matching(link_text) - link_words = extract_words(link_text) - - # Extract words from broken anchor (for additional matching) - broken_anchor_words = extract_words(broken_anchor.replace('-', ' ')) - - best_match = None - best_score = 0.0 - best_details = {} - - for anchor, (heading_text, heading_level, line_num) in headings_dict.items(): - # Skip if anchor is invalid (shouldn't happen, but safety check) - if not validate_anchor(anchor): - continue - - # Strip numbering from heading text - heading_text_no_numbering = strip_numbering_prefix(heading_text) - - # Normalize heading text - normalized_heading = normalize_text_for_matching(heading_text_no_numbering) - heading_words = extract_words(heading_text_no_numbering) - - # Calculate scores for different heuristics - scores = {} - - # Heuristic 1: Word matching (weight: 0.4) - word_score = calculate_word_match_score(link_words, heading_words) - scores['word_match'] = word_score - weighted_word = word_score * 0.4 - - # Heuristic 2: Anchor similarity (weight: 0.3) - anchor_score = 0.0 - if anchor == broken_anchor: - anchor_score = 100.0 - elif broken_anchor in anchor: - anchor_score = 70.0 - elif anchor in broken_anchor: - anchor_score = 50.0 - else: - # Check word overlap in anchors - anchor_words = extract_words(anchor.replace('-', ' ')) - anchor_word_score = calculate_word_match_score(broken_anchor_words, anchor_words) - anchor_score = anchor_word_score * 0.5 # Lower weight for anchor word matching - scores['anchor_similarity'] = anchor_score - weighted_anchor = anchor_score * 0.3 - - # Heuristic 3: Context matching (weight: 0.2) - context_score = 0.0 - # Prefer H2/H3 headings (more likely to be main sections) - if heading_level == 2: - context_score += 30.0 - elif heading_level == 3: - context_score += 20.0 - elif heading_level == 4: - context_score += 10.0 - # Slight preference for earlier headings (but minimal impact) - if line_num < 100: - context_score += 5.0 - scores['context'] = context_score - weighted_context = context_score * 0.2 - - # Heuristic 4: Normalization quality (weight: 0.1) - # Check if normalized versions are similar - norm_score = 0.0 - if normalized_link_text == normalized_heading: - norm_score = 100.0 - elif (normalized_link_text in normalized_heading or - normalized_heading in normalized_link_text): - norm_score = 60.0 - scores['normalization'] = norm_score - weighted_norm = norm_score * 0.1 - - # Calculate total weighted score - total_score = weighted_word + weighted_anchor + weighted_context + weighted_norm - - if total_score > best_score: - best_score = total_score - best_match = anchor - best_details = { - 'heading_text': heading_text, - 'heading_level': heading_level, - 'line_num': line_num, - 'scores': scores, - 'total_score': total_score - } - - # Only return if score is above 70% threshold - if best_match and best_score >= 70.0: - if verbose: - return (best_match, best_score, best_details) - return (best_match, best_score) - - return None - - -# Path validation functions now imported from _validation_utils +from lib._link_extraction import extract_links +from lib._validate_links_helpers import extract_headings, suggest_anchor def resolve_path(base_file, relative_path, repo_root: Path = None): @@ -397,12 +118,12 @@ def validate_requirements_coverage( break if not has_tech_spec_link: - issues.append(ValidationIssue( + issues.append(ValidationIssue.create( 'no_tech_spec_refs', req_file, 1, 1, - 'No tech spec references found', + message='No tech spec references found', severity='warning' )) @@ -450,12 +171,12 @@ def validate_internal_anchor_link( """Validate an anchor-only link within the same file.""" anchor = link_target[1:] if not validate_anchor(anchor): - return [ValidationIssue( + return [ValidationIssue.create( 'unsafe_anchor', ctx.md_file, line_num, line_num, - f"Unsafe internal anchor: #{anchor}", + message=f"Unsafe internal anchor: #{anchor}", severity='error', suggestion=None, target=link_target, @@ -469,12 +190,12 @@ def validate_internal_anchor_link( suggestion, details = build_anchor_suggestion( link_text, anchor, ctx.md_file, ctx, "#" ) - return [ValidationIssue( + return [ValidationIssue.create( 'internal_anchor', ctx.md_file, line_num, line_num, - f"Broken internal anchor #{anchor}", + message=f"Broken internal anchor #{anchor}", severity='error', suggestion=suggestion, target=link_target, @@ -499,12 +220,12 @@ def validate_file_link( """Validate a file link, including optional anchors.""" link_path, anchor = split_link_target(link_target) if anchor and not validate_anchor(anchor): - return [ValidationIssue( + return [ValidationIssue.create( 'unsafe_anchor', ctx.md_file, line_num, line_num, - f"Unsafe anchor: {anchor}", + message=f"Unsafe anchor: {anchor}", severity='error', suggestion=None, target=link_target, @@ -514,12 +235,12 @@ def validate_file_link( if not link_path.endswith('/'): filename = os.path.basename(link_path) if filename and not validate_file_name(filename): - return [ValidationIssue( + return [ValidationIssue.create( 'unsafe_path', ctx.md_file, line_num, line_num, - f"Unsafe or invalid filename: {filename}", + message=f"Unsafe or invalid filename: {filename}", severity='error', suggestion=None, target=link_target, @@ -528,12 +249,12 @@ def validate_file_link( resolved_path = resolve_path(str(ctx.md_file), link_path, ctx.repo_root) if resolved_path is None: - return [ValidationIssue( + return [ValidationIssue.create( 'unsafe_path', ctx.md_file, line_num, line_num, - f"Unsafe or invalid path: {link_path}", + message=f"Unsafe or invalid path: {link_path}", severity='error', suggestion=None, target=link_target, @@ -541,12 +262,12 @@ def validate_file_link( )] if not os.path.exists(resolved_path): - return [ValidationIssue( + return [ValidationIssue.create( 'missing_file', ctx.md_file, line_num, line_num, - f"File not found: {link_path}", + message=f"File not found: {link_path}", severity='error', suggestion=None, target=link_target, @@ -567,12 +288,12 @@ def validate_file_link( suggestion, details = build_anchor_suggestion( link_text, anchor, resolved_path, ctx, f"{link_path}#" ) - return [ValidationIssue( + return [ValidationIssue.create( 'broken_anchor', ctx.md_file, line_num, line_num, - f"Broken anchor in {link_path}#{anchor}", + message=f"Broken anchor in {link_path}#{anchor}", severity='error', suggestion=suggestion, target=link_target, @@ -600,6 +321,305 @@ def validate_link_target( ) +def _parse_cli_args( + argv: List[str], +) -> Tuple[bool, bool, bool, bool, Optional[str], Optional[str]]: + verbose = '--verbose' in argv or '-v' in argv + check_coverage = '--check-coverage' in argv + no_color = parse_no_color_flag(argv) + no_fail = '--no-fail' in argv + output_file = None + target_paths_str = None + for i, arg in enumerate(argv): + if arg in ('--output', '-o') and i + 1 < len(argv): + output_file = argv[i + 1] + elif arg in ('--path', '-p') and i + 1 < len(argv): + target_paths_str = argv[i + 1] + return verbose, check_coverage, no_color, no_fail, output_file, target_paths_str + + +def _build_output( + verbose: bool, + no_color: bool, + output_file: Optional[str], +) -> OutputBuilder: + return OutputBuilder( + "Link and Anchor Validation", + "Validates all markdown links and anchors", + no_color=no_color, + verbose=verbose, + output_file=output_file + ) + + +def _default_exclude_dirs() -> set[str]: + return { + '.git', 'node_modules', 'vendor', '.venv', 'venv', + '__pycache__', '.pytest_cache', 'dist', 'build', + '.idea', '.vscode', 'tmp', '.cache' + } + + +def _warn_non_markdown_targets( + target_paths: List[str], + output: OutputBuilder, +) -> None: + for target_path in target_paths: + target = Path(target_path) + if target.exists() and target.is_file() and target.suffix != '.md': + output.add_warning_line( + f"Target file is not a markdown file: {target_path}" + ) + + +def _build_anchor_cache( + md_files: List[Path], + file_cache: FileContentCache, + output: OutputBuilder, + verbose: bool, +) -> Dict[str, dict]: + output.add_verbose_line("Building anchor cache...") + anchor_cache: Dict[str, dict] = {} + for md_file in md_files: + headings_dict = extract_headings(str(md_file), file_cache) + anchor_cache[str(md_file)] = headings_dict + if verbose: + total_anchors = sum(len(headings) for headings in anchor_cache.values()) + output.add_verbose_line( + f" Cached {total_anchors} anchors from {len(anchor_cache)} files" + ) + output.add_blank_line("working_verbose") + return anchor_cache + + +def _validate_links( + md_files: List[Path], + anchor_cache: Dict[str, dict], + file_cache: FileContentCache, + root_dir: Path, + verbose: bool, +) -> Tuple[List[ValidationIssue], int, int]: + broken_links: List[ValidationIssue] = [] + total_links = 0 + files_with_links = 0 + for md_file in md_files: + links = extract_links(str(md_file), file_cache) + if links: + files_with_links += 1 + ctx = LinkValidationContext( + md_file=md_file, + anchor_cache=anchor_cache, + file_cache=file_cache, + repo_root=root_dir, + verbose=verbose + ) + for link_text, link_target, line_num in links: + total_links += 1 + broken_links.extend(validate_link_target( + link_text, + link_target, + line_num, + ctx + )) + return broken_links, total_links, files_with_links + + +def _check_coverage_if_requested( + check_coverage: bool, + *, + requirements_files: List[Path], + tech_spec_files: List[Path], + anchor_cache: Dict[str, dict], + file_cache: FileContentCache, + output: OutputBuilder, +) -> List: + if not check_coverage: + return [] + output.add_verbose_line("Checking requirements coverage...") + coverage_issues = validate_requirements_coverage( + requirements_files, + tech_spec_files, + anchor_cache, + file_cache + ) + output.add_blank_line("working_verbose") + return coverage_issues + + +def _format_broken_link_issue( + broken: ValidationIssue, + file_path: str, + no_color: bool, +) -> str: + issue_msg = broken.message + if " in " in issue_msg: + link_info = issue_msg.split(" in ", 1)[1] + else: + link_info = issue_msg + return format_issue_message( + "error", + "Broken link", + file_path, + line_num=broken.start_line, + message=link_info, + suggestion=broken.suggestion, + no_color=no_color + ) + + +def _emit_suggestion_details( + output: OutputBuilder, + broken: ValidationIssue, + verbose: bool, +) -> None: + if not (verbose and broken.extra_fields.get('suggestion_details')): + return + details = broken.extra_fields['suggestion_details'] + scores = details.get('scores', {}) + output.add_verbose_line( + f" Suggestion scores: word_match={scores.get('word_match', 0):.1f}, " + f"anchor_similarity={scores.get('anchor_similarity', 0):.1f}, " + f"context={scores.get('context', 0):.1f}, " + f"normalization={scores.get('normalization', 0):.1f}, " + f"total={details.get('total_score', 0):.1f}" + ) + + +def _emit_broken_links_group( + output: OutputBuilder, + section_title: str, + *, + file_paths: List[str], + by_file: Dict[str, List[ValidationIssue]], + no_color: bool, + verbose: bool, + add_blank: bool = False, +) -> None: + if not file_paths: + return + if add_blank: + output.add_blank_line("error") + output.add_line(section_title, section="error") + for file_path in file_paths: + output.add_error_line(f"{file_path}:") + for broken in by_file[file_path]: + error_output = _format_broken_link_issue(broken, file_path, no_color) + output.add_error_line(error_output) + _emit_suggestion_details(output, broken, verbose) + + +def _emit_coverage_warnings( + output: OutputBuilder, + coverage_issues: List, + no_color: bool, +) -> None: + if not coverage_issues: + return + output.add_warnings_header() + output.add_line( + "The following requirements files don't reference any tech specs:", + section="warning" + ) + output.add_blank_line("warning") + for issue in coverage_issues: + if isinstance(issue, ValidationIssue): + warning_msg = issue.format_message(no_color=no_color) + else: + warning_msg = format_issue_message( + "warning", + "No tech spec refs", + issue.get('file', ''), + message=issue.get('issue', 'No tech spec references found'), + no_color=no_color + ) + output.add_warning_line(warning_msg) + output.add_blank_line("warning") + output.add_line( + "Note: After adding tech spec references, verify that each " + "reference points to the correct content.", + section="warning" + ) + + +def _emit_broken_links( + output: OutputBuilder, + broken_links: List[ValidationIssue], + no_color: bool, + verbose: bool, +) -> None: + if not broken_links: + return + output.add_errors_header() + by_file: Dict[str, List[ValidationIssue]] = defaultdict(list) + for broken in broken_links: + by_file[broken.file].append(broken) + + req_files = sorted([f for f in by_file.keys() if 'requirements/' in f]) + spec_files = sorted([f for f in by_file.keys() if 'tech_specs/' in f]) + docs_root_files = sorted([ + f for f in by_file.keys() + if f.startswith('docs/') and '/' not in f[5:] + ]) + other_files_broken = sorted([ + f for f in by_file.keys() + if f not in req_files + and f not in spec_files + and f not in docs_root_files + ]) + + _emit_broken_links_group( + output, + "## Requirements Files", + file_paths=req_files, + by_file=by_file, + no_color=no_color, + verbose=verbose + ) + _emit_broken_links_group( + output, + "## Tech Spec Files", + file_paths=spec_files, + by_file=by_file, + no_color=no_color, + verbose=verbose, + add_blank=bool(req_files) + ) + _emit_broken_links_group( + output, + "## Root Documentation Files", + file_paths=docs_root_files, + by_file=by_file, + no_color=no_color, + verbose=verbose, + add_blank=bool(req_files or spec_files) + ) + _emit_broken_links_group( + output, + "## Other Files", + file_paths=other_files_broken, + by_file=by_file, + no_color=no_color, + verbose=verbose, + add_blank=bool(req_files or spec_files or docs_root_files) + ) + + has_tech_spec_links = any( + 'tech_specs/' in ( + broken.extra_fields.get('target', '') + if isinstance(broken, ValidationIssue) + else broken.get('target', '') + ) + for broken in broken_links + ) + if has_tech_spec_links: + output.add_blank_line("error") + output.add_line( + "Note: After fixing broken links to tech specs, verify that each " + "updated reference points to the correct content.", + section="error" + ) + + def main(): """Main validation function.""" # Show help if requested @@ -607,42 +627,22 @@ def main(): print(__doc__) return 0 - # Parse command line arguments - verbose = '--verbose' in sys.argv or '-v' in sys.argv - check_coverage = '--check-coverage' in sys.argv - no_color = parse_no_color_flag(sys.argv) - no_fail = '--no-fail' in sys.argv - output_file = None - target_paths_str = None - - for i, arg in enumerate(sys.argv): - if arg in ('--output', '-o') and i + 1 < len(sys.argv): - output_file = sys.argv[i + 1] - elif arg in ('--path', '-p') and i + 1 < len(sys.argv): - target_paths_str = sys.argv[i + 1] + verbose, check_coverage, no_color, no_fail, output_file, target_paths_str = ( + _parse_cli_args(sys.argv) + ) # Parse comma-separated paths target_paths = parse_paths(target_paths_str) # Create output builder (header streams immediately if verbose) - output = OutputBuilder( - "Link and Anchor Validation", - "Validates all markdown links and anchors", - no_color=no_color, - verbose=verbose, - output_file=output_file - ) + output = _build_output(verbose, no_color, output_file) # Find all markdown files in the repository # Start from current directory (repository root when called from Makefile) root_dir = Path(".") # Directories to exclude from scanning (only when no target path is specified) - exclude_dirs = { - '.git', 'node_modules', 'vendor', '.venv', 'venv', - '__pycache__', '.pytest_cache', 'dist', 'build', - '.idea', '.vscode', 'tmp', '.cache' - } + exclude_dirs = _default_exclude_dirs() # Find all markdown files using shared utility md_files = find_markdown_files( @@ -654,12 +654,7 @@ def main(): # Handle warnings for non-markdown files when target_paths is specified if target_paths: - for target_path in target_paths: - target = Path(target_path) - if target.exists() and target.is_file() and target.suffix != '.md': - output.add_warning_line( - f"Target file is not a markdown file: {target_path}" - ) + _warn_non_markdown_targets(target_paths, output) if not md_files: output.add_error_line("No markdown files found") @@ -682,57 +677,26 @@ def main(): output.add_blank_line("working_verbose") # Build anchor cache (now includes heading text and metadata) - output.add_verbose_line("Building anchor cache...") - anchor_cache = {} - for md_file in md_files: - headings_dict = extract_headings(str(md_file), file_cache) - anchor_cache[str(md_file)] = headings_dict - - if verbose: - total_anchors = sum(len(headings) for headings in anchor_cache.values()) - output.add_verbose_line(f" Cached {total_anchors} anchors from {len(anchor_cache)} files") - output.add_blank_line("working_verbose") + anchor_cache = _build_anchor_cache(md_files, file_cache, output, verbose) # Validate links - broken_links: List[ValidationIssue] = [] - total_links = 0 - files_with_links = 0 - - if verbose: - output.add_verbose_line("Validating links...") - for md_file in md_files: - links = extract_links(str(md_file), file_cache) - - if links: - files_with_links += 1 - ctx = LinkValidationContext( - md_file=md_file, - anchor_cache=anchor_cache, - file_cache=file_cache, - repo_root=root_dir, - verbose=verbose - ) - - for link_text, link_target, line_num in links: - total_links += 1 - broken_links.extend(validate_link_target( - link_text, - link_target, - line_num, - ctx - )) + broken_links, total_links, files_with_links = _validate_links( + md_files, + anchor_cache, + file_cache, + root_dir, + verbose + ) # Check requirements coverage if requested - coverage_issues = [] - if check_coverage: - output.add_verbose_line("Checking requirements coverage...") - coverage_issues = validate_requirements_coverage( - requirements_files, - tech_spec_files, - anchor_cache, - file_cache - ) - output.add_blank_line("working_verbose") + coverage_issues = _check_coverage_if_requested( + check_coverage, + requirements_files=requirements_files, + tech_spec_files=tech_spec_files, + anchor_cache=anchor_cache, + file_cache=file_cache, + output=output + ) summary_items = [ ("Files scanned:", len(md_files)), @@ -747,223 +711,29 @@ def main(): output.add_summary_section(summary_items) # Report coverage issues first - if coverage_issues: - output.add_warnings_header() - output.add_line( - "The following requirements files don't reference any tech specs:", - section="warning" - ) - output.add_blank_line("warning") - - for issue in coverage_issues: - # Convert ValidationIssue to format message if needed - if isinstance(issue, ValidationIssue): - warning_msg = issue.format_message(no_color=no_color) - else: - warning_msg = format_issue_message( - "warning", - "No tech spec refs", - issue.get('file', ''), - None, - issue.get('issue', 'No tech spec references found'), - no_color - ) - output.add_warning_line(warning_msg) - - output.add_blank_line("warning") - output.add_line( - "Note: After adding tech spec references, verify that each " - "reference points to the correct content.", - section="warning" - ) + _emit_coverage_warnings(output, coverage_issues, no_color) # Report broken links + _emit_broken_links(output, broken_links, no_color, verbose) + + # Final status if broken_links: - output.add_errors_header() - - # Group by file for better readability - by_file = defaultdict(list) - for broken in broken_links: - # broken_links contains ValidationIssue objects - file_key = broken.file - by_file[file_key].append(broken) - - # Separate files by category for better organization - req_files = sorted([f for f in by_file.keys() - if 'requirements/' in f]) - spec_files = sorted([f for f in by_file.keys() - if 'tech_specs/' in f]) - docs_root_files = sorted([f for f in by_file.keys() - if f.startswith('docs/') - and '/' not in f[5:]]) - other_files_broken = sorted([f for f in by_file.keys() - if f not in req_files - and f not in spec_files - and f not in docs_root_files]) - - if req_files: - output.add_line("## Requirements Files", section="error") - for file_path in req_files: - output.add_error_line(f"{file_path}:") - for broken in by_file[file_path]: - # broken is a ValidationIssue object - issue_msg = broken.message - if " in " in issue_msg: - link_info = issue_msg.split(" in ", 1)[1] - else: - link_info = issue_msg - error_output = format_issue_message( - "error", - "Broken link", - file_path, - broken.start_line, - link_info, - broken.suggestion, - no_color - ) - output.add_error_line(error_output) - - # Add verbose output for suggestion details - if verbose and broken.extra_fields.get('suggestion_details'): - details = broken.extra_fields['suggestion_details'] - scores = details.get('scores', {}) - output.add_verbose_line( - f" Suggestion scores: word_match={scores.get('word_match', 0):.1f}, " - f"anchor_similarity={scores.get('anchor_similarity', 0):.1f}, " - f"context={scores.get('context', 0):.1f}, " - f"normalization={scores.get('normalization', 0):.1f}, " - f"total={details.get('total_score', 0):.1f}" - ) - - if spec_files: - output.add_blank_line("error") - output.add_line("## Tech Spec Files", section="error") - for file_path in spec_files: - output.add_error_line(f"{file_path}:") - for broken in by_file[file_path]: - # broken is a ValidationIssue object - issue_msg = broken.message - if " in " in issue_msg: - link_info = issue_msg.split(" in ", 1)[1] - else: - link_info = issue_msg - error_output = format_issue_message( - "error", - "Broken link", - file_path, - broken.start_line, - link_info, - broken.suggestion, - no_color - ) - output.add_error_line(error_output) - - # Add verbose output for suggestion details - if verbose and broken.extra_fields.get('suggestion_details'): - details = broken.extra_fields['suggestion_details'] - scores = details.get('scores', {}) - output.add_verbose_line( - f" Suggestion scores: word_match={scores.get('word_match', 0):.1f}, " - f"anchor_similarity={scores.get('anchor_similarity', 0):.1f}, " - f"context={scores.get('context', 0):.1f}, " - f"normalization={scores.get('normalization', 0):.1f}, " - f"total={details.get('total_score', 0):.1f}" - ) - - if docs_root_files: - output.add_blank_line("error") - output.add_line("## Root Documentation Files", section="error") - for file_path in docs_root_files: - output.add_error_line(f"{file_path}:") - for broken in by_file[file_path]: - # broken is a ValidationIssue object - issue_msg = broken.message - if " in " in issue_msg: - link_info = issue_msg.split(" in ", 1)[1] - else: - link_info = issue_msg - error_output = format_issue_message( - "error", - "Broken link", - file_path, - broken.start_line, - link_info, - broken.suggestion, - no_color - ) - output.add_error_line(error_output) - - # Add verbose output for suggestion details - if verbose and broken.extra_fields.get('suggestion_details'): - details = broken.extra_fields['suggestion_details'] - scores = details.get('scores', {}) - output.add_verbose_line( - f" Suggestion scores: word_match={scores.get('word_match', 0):.1f}, " - f"anchor_similarity={scores.get('anchor_similarity', 0):.1f}, " - f"context={scores.get('context', 0):.1f}, " - f"normalization={scores.get('normalization', 0):.1f}, " - f"total={details.get('total_score', 0):.1f}" - ) - - if other_files_broken: - output.add_blank_line("error") - output.add_line("## Other Files", section="error") - for file_path in other_files_broken: - output.add_error_line(f"{file_path}:") - for broken in by_file[file_path]: - # broken is a ValidationIssue object - issue_msg = broken.message - if " in " in issue_msg: - link_info = issue_msg.split(" in ", 1)[1] - else: - link_info = issue_msg - error_output = format_issue_message( - "error", - "Broken link", - file_path, - broken.start_line, - link_info, - broken.suggestion, - no_color - ) - output.add_error_line(error_output) - - # Add verbose output for suggestion details - if verbose and broken.extra_fields.get('suggestion_details'): - details = broken.extra_fields['suggestion_details'] - scores = details.get('scores', {}) - output.add_verbose_line( - f" Suggestion scores: word_match={scores.get('word_match', 0):.1f}, " - f"anchor_similarity={scores.get('anchor_similarity', 0):.1f}, " - f"context={scores.get('context', 0):.1f}, " - f"normalization={scores.get('normalization', 0):.1f}, " - f"total={details.get('total_score', 0):.1f}" - ) - - # Check if any broken links point to tech specs - has_tech_spec_links = any( - 'tech_specs/' in ( - broken.extra_fields.get('target', '') - if isinstance(broken, ValidationIssue) - else broken.get('target', '') + output.add_failure_message("Validation failed. Please fix the errors above.") + elif coverage_issues: + msg = "All links are valid. Review the warnings above." + if check_coverage: + msg = ( + "All links are valid. All requirements reference tech specs. " + "Review the warnings above." ) - for broken in broken_links + output.add_warnings_only_message( + message=msg, + verbose_hint="Run with --verbose to see the full warning details.", ) - if has_tech_spec_links: - output.add_blank_line("error") - output.add_line( - "Note: After fixing broken links to tech specs, verify that each " - "updated reference points to the correct content.", - section="error" - ) - - # Final status - if not broken_links and not coverage_issues: + else: output.add_success_message("All links are valid!") if check_coverage: output.add_success_message("All requirements reference tech specs!") - else: - output.add_failure_message("Validation failed. Please fix the errors above.") output.print() return output.get_exit_code(no_fail) diff --git a/scripts/validate_req_references.py b/scripts/validate_req_references.py index b30fabf5..1c444025 100644 --- a/scripts/validate_req_references.py +++ b/scripts/validate_req_references.py @@ -40,16 +40,9 @@ import argparse from pathlib import Path from collections import defaultdict +from typing import Dict, List, Optional, Tuple -scripts_dir = Path(__file__).parent -lib_dir = scripts_dir / "lib" - -# Import shared utilities -for module_path in (str(scripts_dir), str(lib_dir)): - if module_path not in sys.path: - sys.path.insert(0, module_path) - -from lib._validation_utils import ( # noqa: E402 +from lib._validation_utils import ( OutputBuilder, parse_no_color_flag, is_in_dot_directory, get_workspace_root, parse_paths, ValidationIssue, DOCS_DIR, REQUIREMENTS_DIR, FEATURES_DIR @@ -135,7 +128,7 @@ def extract_req_tags_from_feature(feature_file, verbose=False): f" Warning: Could not decode {feature_file} (encoding issue): {e}", file=sys.stderr ) - except Exception as e: + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: # Unexpected errors - log warning if verbose if verbose: print(f" Warning: Unexpected error reading {feature_file}: {e}", file=sys.stderr) @@ -181,7 +174,7 @@ def extract_req_definitions_from_requirements(req_file, verbose=False): f"in {req_file.name}" ) - except Exception as e: + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: if verbose: print(f" Warning: Could not read {req_file}: {e}") @@ -230,7 +223,7 @@ def validate_req_format(req_id): return bool(_RE_REQ_FORMAT_PATTERN.match(req_id)) -def validate_req_format_errors(requirements_data, feature_tags, workspace_root, features_dir): +def validate_req_format_errors(requirements_data, feature_tags, workspace_root, _features_dir): """ Validate format consistency of all requirement IDs. @@ -256,12 +249,12 @@ def get_relative_path(file_path, base_dir): for req_file, req_list in requirements_data.items(): for req_id, line_num, _ in req_list: if not _RE_REQ_FORMAT_PATTERN.match(req_id): - format_errors.append(ValidationIssue( + format_errors.append(ValidationIssue.create( 'format', Path(get_relative_path(req_file, workspace_root)), line_num, line_num, - f"{req_id}: Invalid format: does not match REQ-[A-Z_]+-[0-9]+", + message=f"{req_id}: Invalid format: does not match REQ-[A-Z_]+-[0-9]+", severity='error', req_id=req_id, reason='Invalid format: does not match REQ-[A-Z_]+-[0-9]+' @@ -271,12 +264,12 @@ def get_relative_path(file_path, base_dir): for req_id, feature_files in feature_tags.items(): if not _RE_REQ_FORMAT_PATTERN.match(req_id): for feature_file in feature_files: - format_errors.append(ValidationIssue( + format_errors.append(ValidationIssue.create( 'format', Path(get_relative_path(feature_file, workspace_root)), 1, 1, - f"{req_id}: Invalid format: does not match REQ-[A-Z_]+-[0-9]+", + message=f"{req_id}: Invalid format: does not match REQ-[A-Z_]+-[0-9]+", severity='error', req_id=req_id, reason='Invalid format: does not match REQ-[A-Z_]+-[0-9]+' @@ -347,7 +340,7 @@ def get_relative_path(file_path, base_dir): for req_file, req_list in requirements_data.items(): # Group by category prefix category_reqs = defaultdict(list) - for req_id, line_num, req_type in req_list: + for req_id, line_num, _req_type in req_list: category = get_category_from_req_id(req_id) if category: # Extract numeric suffix @@ -375,12 +368,12 @@ def get_relative_path(file_path, base_dir): if missing_numbers: missing_str = ', '.join(str(num) for num in missing_numbers) - sequential_warnings.append(ValidationIssue( + sequential_warnings.append(ValidationIssue.create( 'sequential', Path(get_relative_path(req_file, workspace_root)), 1, 1, - f"REQ-{category}-* (Missing numbers: {missing_str})", + message=f"REQ-{category}-* (Missing numbers: {missing_str})", severity='warning', category=category, missing_numbers=missing_numbers @@ -389,6 +382,38 @@ def get_relative_path(file_path, base_dir): return sequential_warnings +def _collect_feature_files_for_stub_check(features_dir, target_paths, verbose): + """Return list of feature file paths to check for stubs.""" + if target_paths: + out = [] + for target_path in target_paths: + target = Path(target_path) + if not target.exists(): + if verbose: + print(f"Warning: Target path does not exist: {target_path}") + continue + if target.is_file(): + if target.suffix == '.feature': + out.append(target) + elif verbose: + print(f"Warning: Target file is not a .feature file: {target_path}") + else: + out.extend(f for f in target.rglob('*.feature') if not is_in_dot_directory(f)) + return out + return [f for f in features_dir.rglob('*.feature') if not is_in_dot_directory(f)] + + +def _count_content_lines(path: Path) -> int: + """Return number of non-empty, non-comment lines in a feature file.""" + count = 0 + with open(path, 'r', encoding='utf-8') as f: + for line in f: + stripped = line.strip() + if stripped and not stripped.startswith('#'): + count += 1 + return count + + def check_feature_stubs(features_dir, verbose=False, target_paths=None): """ Check for feature files that are stubs (8 or fewer content lines). @@ -405,107 +430,46 @@ def check_feature_stubs(features_dir, verbose=False, target_paths=None): List of feature files that are stubs (each entry is a dict with 'file' and 'line_count' keys) """ - # Note: This function is called before OutputBuilder is used, - # so we can't use output.add_verbose_line here. - # The message will be handled by the caller if needed. - - # Determine which files to scan (same logic as validate_req_references) - feature_files = [] - if target_paths: - for target_path in target_paths: - target = Path(target_path) - if not target.exists(): - if verbose: - print(f"Warning: Target path does not exist: {target_path}") - continue - - if target.is_file(): - if target.suffix == '.feature': - feature_files.append(target) - elif verbose: - print(f"Warning: Target file is not a .feature file: {target_path}") - else: - feature_files.extend([ - f for f in target.rglob('*.feature') - if not is_in_dot_directory(f) - ]) - else: - feature_files = [ - f for f in features_dir.rglob('*.feature') - if not is_in_dot_directory(f) - ] - + feature_files = _collect_feature_files_for_stub_check( + features_dir, target_paths, verbose + ) stub_files = [] - for feature_file in feature_files: try: - content_line_count = 0 - with open(feature_file, 'r', encoding='utf-8') as f: - for line in f: - stripped = line.strip() - # Skip empty lines and comments - if stripped and not stripped.startswith('#'): - content_line_count += 1 - + content_line_count = _count_content_lines(feature_file) if content_line_count <= 8: - stub_files.append(ValidationIssue( - 'stub_file', - feature_file, - 1, - 1, - f"Feature file is a stub ({content_line_count} content lines)", - severity='warning', - line_count=content_line_count + stub_files.append(ValidationIssue.create( + 'stub_file', feature_file, 1, 1, + message=f"Feature file is a stub ({content_line_count} content lines)", + severity='warning', line_count=content_line_count, )) if verbose: print(f" Found stub: {feature_file.name} ({content_line_count} content lines)") - except (IOError, OSError) as e: - # File read errors - log warning if verbose if verbose: print(f" Warning: Could not read {feature_file}: {e}", file=sys.stderr) except UnicodeDecodeError as e: - # Encoding errors - log warning if verbose if verbose: print( f" Warning: Could not decode {feature_file} (encoding issue): {e}", - file=sys.stderr + file=sys.stderr, ) - except Exception as e: - # Unexpected errors - log warning if verbose + except (ValueError, KeyError, TypeError, AttributeError, RuntimeError) as e: if verbose: print(f" Warning: Unexpected error reading {feature_file}: {e}", file=sys.stderr) - if verbose: print() - return stub_files -def validate_req_references( - features_dir, requirements_dir, output, verbose=False, target_paths=None, no_color=False -): - """ - Validate that all REQ tags in feature files exist in requirements files. - - Args: - features_dir: Path to the features directory - requirements_dir: Path to the requirements directory - verbose: Whether to show detailed progress - target_paths: Optional list of specific files or directories to check - no_color: Whether to disable colored output - - Returns: - Tuple of (total_refs, invalid_refs, missing_refs, errors, format_errors, - duplicate_errors, sequential_warnings) - """ - # Determine workspace root (requirements_dir is docs/requirements, so parent.parent is root) - workspace_root = requirements_dir.parent.parent - - # Load all requirement definitions from requirements files with line numbers - all_requirements = {} - req_files = {} - requirements_data = {} # req_file -> list of (req_id, line_num) tuples +def _load_requirement_definitions( + requirements_dir: Path, + output: OutputBuilder, + verbose: bool, +) -> Tuple[Dict[str, str], Dict[Path, List[Tuple[str, int, int]]]]: + all_requirements: Dict[str, str] = {} + req_files: Dict[str, str] = {} + requirements_data: Dict[Path, List[Tuple[str, int, int]]] = {} output.add_verbose_line("Loading requirement definitions...") if verbose: @@ -532,14 +496,15 @@ def validate_req_references( f"from {len(req_files)} files" ) output.add_blank_line("working_verbose") + return all_requirements, requirements_data - # Scan all feature files for REQ tags - output.add_verbose_line("Scanning feature files...") - if verbose: - output.add_blank_line("working_verbose") - # Determine which files to scan - feature_files = [] +def _collect_feature_files( + features_dir: Path, + target_paths: Optional[List[str]], + output: OutputBuilder, +) -> List[Path]: + feature_files: List[Path] = [] if target_paths: for target_path in target_paths: target = Path(target_path) @@ -551,7 +516,9 @@ def validate_req_references( if target.suffix == '.feature': feature_files.append(target) else: - output.add_warning_line(f"Target file is not a .feature file: {target_path}") + output.add_warning_line( + f"Target file is not a .feature file: {target_path}" + ) else: feature_files.extend([ f for f in target.rglob('*.feature') @@ -562,36 +529,28 @@ def validate_req_references( f for f in features_dir.rglob('*.feature') if not is_in_dot_directory(f) ] + return feature_files - all_req_tags = defaultdict(list) # req_id -> list of feature files +def _collect_req_tags( + feature_files: List[Path], + verbose: bool, +) -> Dict[str, List[Path]]: + all_req_tags: Dict[str, List[Path]] = defaultdict(list) for feature_file in feature_files: req_tags = extract_req_tags_from_feature(feature_file, verbose) for req_id in req_tags: all_req_tags[req_id].append(feature_file) + return all_req_tags - if verbose: - output.add_blank_line("working_verbose") - output.add_verbose_line( - f"Found {len(all_req_tags)} unique REQ tags " - f"across {len(feature_files)} feature files" - ) - output.add_blank_line("working_verbose") - - # Run format validation first - format_errors = validate_req_format_errors( - requirements_data, all_req_tags, workspace_root, features_dir - ) - - # Check for duplicates - duplicate_errors = check_duplicate_requirements(requirements_data, workspace_root) - - # Check sequential numbering (warnings, not errors) - sequential_warnings = check_sequential_numbering(requirements_data, workspace_root) - invalid_refs = [] - missing_refs = [] - errors = [] +def _validate_req_tags( + all_req_tags: Dict[str, List[Path]], + all_requirements: Dict[str, str], +) -> Tuple[List[ValidationIssue], List[ValidationIssue], List[ValidationIssue]]: + invalid_refs: List[ValidationIssue] = [] + missing_refs: List[ValidationIssue] = [] + errors: List[ValidationIssue] = [] for req_id in sorted(all_req_tags.keys()): category = get_category_from_req_id(req_id) @@ -599,12 +558,12 @@ def validate_req_references( if not category: first_file = list(all_req_tags[req_id])[0] if all_req_tags[req_id] else None if first_file: - errors.append(ValidationIssue( + errors.append(ValidationIssue.create( 'invalid_req_format', first_file, 1, 1, - f"{req_id}: Invalid REQ ID format", + message=f"{req_id}: Invalid REQ ID format", severity='error', req_id=req_id, reason='Invalid REQ ID format', @@ -618,12 +577,15 @@ def validate_req_references( first_file = list(all_req_tags[req_id])[0] if all_req_tags[req_id] else None if first_file: if replacement: - errors.append(ValidationIssue( + errors.append(ValidationIssue.create( 'deprecated_category', first_file, 1, 1, - f"{req_id}: Deprecated category prefix: {category} (use {replacement})", + message=( + f"{req_id}: Deprecated category prefix: {category} " + f"(use {replacement})" + ), severity='error', req_id=req_id, reason=f'Deprecated category prefix: {category} (use {replacement})', @@ -632,12 +594,12 @@ def validate_req_references( suggested_category=replacement, )) else: - errors.append(ValidationIssue( + errors.append(ValidationIssue.create( 'invalid_category', first_file, 1, 1, - f"{req_id}: Invalid category prefix: {category}", + message=f"{req_id}: Invalid category prefix: {category}", severity='error', req_id=req_id, reason=f'Invalid category prefix: {category}', @@ -651,12 +613,12 @@ def validate_req_references( if not expected_file: first_file = list(all_req_tags[req_id])[0] if all_req_tags[req_id] else None if first_file: - errors.append(ValidationIssue( + errors.append(ValidationIssue.create( 'unknown_category', first_file, 1, 1, - f"{req_id}: Unknown category: {category}", + message=f"{req_id}: Unknown category: {category}", severity='error', req_id=req_id, reason=f'Unknown category: {category}', @@ -667,12 +629,12 @@ def validate_req_references( if req_id not in all_requirements: first_file = list(all_req_tags[req_id])[0] if all_req_tags[req_id] else None if first_file: - missing_refs.append(ValidationIssue( + missing_refs.append(ValidationIssue.create( 'missing_ref', first_file, 1, 1, - f"{req_id} not found in {expected_file}", + message=f"{req_id} not found in {expected_file}", severity='error', req_id=req_id, expected_file=expected_file, @@ -681,91 +643,143 @@ def validate_req_references( elif all_requirements[req_id] != expected_file: first_file = list(all_req_tags[req_id])[0] if all_req_tags[req_id] else None if first_file: - invalid_refs.append(ValidationIssue( + invalid_refs.append(ValidationIssue.create( 'invalid_ref', first_file, 1, 1, - f"{req_id} found in {all_requirements[req_id]}, expected {expected_file}", + message=( + f"{req_id} found in {all_requirements[req_id]}, " + f"expected {expected_file}" + ), severity='error', req_id=req_id, expected_file=expected_file, actual_file=all_requirements[req_id], files=list(all_req_tags[req_id]) )) + return invalid_refs, missing_refs, errors - # Helper function to display file paths relative to features_dir or absolute - def display_path(feature_file, base_dir): - try: - return str(feature_file.relative_to(base_dir)) - except ValueError: - return str(feature_file) - # Report warnings first (sequential numbering gaps) - if sequential_warnings: +def _display_path(feature_file: Path, base_dir: Path) -> str: + try: + return str(feature_file.relative_to(base_dir)) + except ValueError: + return str(feature_file) + + +def _emit_issue_list( + output: OutputBuilder, + issues: List[ValidationIssue], + no_color: bool, + *, + header: str, + features_dir: Optional[Path] = None, + show_files: bool = False, +) -> None: + if not issues: + return + if header == "warning": output.add_warnings_header() - for warning in sequential_warnings: - # warning is a ValidationIssue - warning_msg = warning.format_message(no_color=no_color) - output.add_warning_line(warning_msg) - - # Report format errors - if format_errors: - output.add_errors_header() - for error in format_errors: - # error is a ValidationIssue - error_msg = error.format_message(no_color=no_color) - output.add_error_line(error_msg) - - # Report duplicate errors - if duplicate_errors: - output.add_errors_header() - for error in duplicate_errors: - # error is a ValidationIssue - error_msg = error.format_message(no_color=no_color) - output.add_error_line(error_msg) - - # Report errors (invalid format or unknown category from feature tags) - if errors: + else: output.add_errors_header() - for error in errors: - # error is a ValidationIssue - error_msg = error.format_message(no_color=no_color) - output.add_error_line(error_msg) - # Show additional files if any - files = error.extra_fields.get('files', []) + for issue in issues: + issue_msg = issue.format_message(no_color=no_color) + if header == "warning": + output.add_warning_line(issue_msg) + else: + output.add_error_line(issue_msg) + if show_files and features_dir: + files = issue.extra_fields.get('files', []) for feature_file in files[1:]: output.add_error_line( - f" Also in: {display_path(feature_file, features_dir)}" + f" Also in: {_display_path(feature_file, features_dir)}" ) - # Report invalid references (wrong file) - if invalid_refs: - output.add_errors_header() - for ref in invalid_refs: - # ref is a ValidationIssue - error_msg = ref.format_message(no_color=no_color) - output.add_error_line(error_msg) - # Show additional files if any - files = ref.extra_fields.get('files', []) - for feature_file in files[1:]: - output.add_error_line( - f" Also in: {display_path(feature_file, features_dir)}" - ) - # Report missing references - if missing_refs: - output.add_errors_header() - for ref in missing_refs: - # ref is a ValidationIssue - error_msg = ref.format_message(no_color=no_color) - output.add_error_line(error_msg) - # Show additional files if any - files = ref.extra_fields.get('files', []) - for feature_file in files[1:]: - output.add_error_line( - f" Also in: {display_path(feature_file, features_dir)}" - ) +def validate_req_references( + features_dir, requirements_dir, output, *, verbose=False, target_paths=None, no_color=False +): + """ + Validate that all REQ tags in feature files exist in requirements files. + + Args: + features_dir: Path to the features directory + requirements_dir: Path to the requirements directory + verbose: Whether to show detailed progress + target_paths: Optional list of specific files or directories to check + no_color: Whether to disable colored output + + Returns: + Tuple of (total_refs, invalid_refs, missing_refs, errors, format_errors, + duplicate_errors, sequential_warnings) + """ + # Determine workspace root (requirements_dir is docs/requirements, so parent.parent is root) + workspace_root = requirements_dir.parent.parent + + all_requirements, requirements_data = _load_requirement_definitions( + requirements_dir, output, verbose + ) + + # Scan all feature files for REQ tags + output.add_verbose_line("Scanning feature files...") + if verbose: + output.add_blank_line("working_verbose") + + feature_files = _collect_feature_files(features_dir, target_paths, output) + all_req_tags = _collect_req_tags(feature_files, verbose) + + if verbose: + output.add_blank_line("working_verbose") + output.add_verbose_line( + f"Found {len(all_req_tags)} unique REQ tags " + f"across {len(feature_files)} feature files" + ) + output.add_blank_line("working_verbose") + + # Run format validation first + format_errors = validate_req_format_errors( + requirements_data, all_req_tags, workspace_root, features_dir + ) + + # Check for duplicates + duplicate_errors = check_duplicate_requirements(requirements_data, workspace_root) + + # Check sequential numbering (warnings, not errors) + sequential_warnings = check_sequential_numbering(requirements_data, workspace_root) + + invalid_refs, missing_refs, errors = _validate_req_tags( + all_req_tags, all_requirements + ) + + # Report warnings first (sequential numbering gaps) + _emit_issue_list(output, sequential_warnings, no_color, header="warning") + _emit_issue_list(output, format_errors, no_color, header="error") + _emit_issue_list(output, duplicate_errors, no_color, header="error") + _emit_issue_list( + output, + errors, + no_color, + header="error", + features_dir=features_dir, + show_files=True + ) + _emit_issue_list( + output, + invalid_refs, + no_color, + header="error", + features_dir=features_dir, + show_files=True + ) + _emit_issue_list( + output, + missing_refs, + no_color, + header="error", + features_dir=features_dir, + show_files=True + ) # Summary total_refs = len(all_req_tags) @@ -885,10 +899,11 @@ def display_path(feature_file, base_dir): # Validate requirement references ( - total, invalid, missing, errors, format_errors, duplicate_errors, - sequential_warnings + _total, invalid, missing, errors, format_errors, duplicate_errors, + _sequential_warnings ) = validate_req_references( - features_dir, requirements_dir, output, args.verbose, target_paths, no_color + features_dir, requirements_dir, output, + verbose=args.verbose, target_paths=target_paths, no_color=no_color ) # Return error code only if errors found (warnings don't cause failure) @@ -898,7 +913,12 @@ def display_path(feature_file, base_dir): ) if not has_errors: - output.add_success_message("All requirement references are valid!") + if output.has_warnings(): + output.add_warnings_only_message( + verbose_hint="Run with --verbose to see the full warning details.", + ) + else: + output.add_success_message("All requirement references are valid!") else: output.add_failure_message("Validation failed. Please fix the errors above.") From 5b2bd797f87fe54d096cdd35a0526f76b734ee03 Mon Sep 17 00:00:00 2001 From: Andre Date: Tue, 3 Feb 2026 03:38:55 -0500 Subject: [PATCH 4/7] docs(spec): align core/package read-write API references Update tech specs and requirements links/headings to reflect the current Package read/write operations framing and refresh the Go definitions index entries. --- docs/requirements/core.md | 108 +- docs/requirements/metadata.md | 2 +- docs/tech_specs/api_basic_operations.md | 129 +- docs/tech_specs/api_core.md | 204 +-- docs/tech_specs/api_file_mgmt_extraction.md | 2 +- docs/tech_specs/api_file_mgmt_file_entry.md | 79 +- docs/tech_specs/api_file_mgmt_queries.md | 9 +- docs/tech_specs/api_generics.md | 9 +- docs/tech_specs/api_go_defs_index.md | 1288 +++++++++---------- docs/tech_specs/api_metadata.md | 28 +- docs/tech_specs/file_validation.md | 2 +- 11 files changed, 970 insertions(+), 890 deletions(-) diff --git a/docs/requirements/core.md b/docs/requirements/core.md index 20f6ae53..f1a0dcfb 100644 --- a/docs/requirements/core.md +++ b/docs/requirements/core.md @@ -2,23 +2,23 @@ ## Core Interfaces -- REQ-CORE-004: PackageReader interface provides read-only package access [type: architectural]. [api_core.md#12-packagereader-interface](../tech_specs/api_core.md#12-packagereader-interface) -- REQ-CORE-005: PackageWriter interface provides package modification capabilities [type: architectural]. [api_core.md#13-packagewriter-interface](../tech_specs/api_core.md#13-packagewriter-interface) -- REQ-CORE-185: PackageReader contract defines read-only interface for opened packages [type: architectural]. [api_core.md#121-packagereader-contract](../tech_specs/api_core.md#121-packagereader-contract) -- REQ-CORE-186: Reader contract scope defines PackageReader method assumptions [type: architectural]. [api_core.md#1211-reader-contract-scope](../tech_specs/api_core.md#1211-reader-contract-scope) +- REQ-CORE-004: Package read operations provide read-only package access [type: architectural]. [api_core.md#12-package-read-operations](../tech_specs/api_core.md#12-package-read-operations) +- REQ-CORE-005: Package write operations provide package persistence capabilities [type: architectural]. [api_core.md#13-package-write-operations](../tech_specs/api_core.md#13-package-write-operations) +- REQ-CORE-185: Read operations contract defines read-only behavior for opened packages [type: architectural]. [api_core.md#121-read-operations-contract](../tech_specs/api_core.md#121-read-operations-contract) +- REQ-CORE-186: Reader contract scope defines read operation assumptions [type: architectural]. [api_core.md#1211-reader-contract-scope](../tech_specs/api_core.md#1211-reader-contract-scope) - REQ-CORE-187: OpenPackage eager metadata load requires all package metadata to be loaded into memory [type: constraint]. [api_core.md#1212-openpackage-eager-metadata-load](../tech_specs/api_core.md#1212-openpackage-eager-metadata-load) -- REQ-CORE-188: Context usage defines when context.Context is required for PackageReader methods [type: architectural]. [api_core.md#02-context-integration](../tech_specs/api_core.md#02-context-integration) (Exception: also [api_core.md#1213-context-usage](../tech_specs/api_core.md#1213-context-usage) for coverage.) +- REQ-CORE-188: Context usage defines when context.Context is required for package read operations [type: architectural]. [api_core.md#02-context-integration](../tech_specs/api_core.md#02-context-integration) (Exception: also [api_core.md#1213-context-usage](../tech_specs/api_core.md#1213-context-usage) for coverage.) - REQ-CORE-189: Code reuse requirement defines shared helper functions for operations with underlying functionality [type: architectural]. [api_core.md#1215-code-reuse-requirement](../tech_specs/api_core.md#1215-code-reuse-requirement) -- REQ-CORE-080: ReadFile cross-reference provides reference to lightweight header-only inspection [type: documentation-only] (documentation-only: cross-reference - DO NOT CREATE FEATURE FILE). [api_core.md#122-packagereaderreadfile-method](../tech_specs/api_core.md#122-packagereaderreadfile-method) +- REQ-CORE-080: ReadFile cross-reference provides reference to lightweight header-only inspection [type: documentation-only] (documentation-only: cross-reference - DO NOT CREATE FEATURE FILE). [api_core.md#122-packagereadfile-method](../tech_specs/api_core.md#122-packagereadfile-method) - REQ-CORE-081: NormalizePackagePath function normalizes package paths [type: architectural]. [api_core.md#121-normalizepackagepath-function](../tech_specs/api_core.md#121-normalizepackagepath-function), [api_core.md#1211-normalizepackagepath-error-handling](../tech_specs/api_core.md#1211-normalizepackagepath-error-handling), [api_core.md#1212-normalizepackagepath-return-value](../tech_specs/api_core.md#1212-normalizepackagepath-return-value) - REQ-CORE-082: ToDisplayPath function converts stored paths to display paths [type: architectural]. [api_core.md#122-todisplaypath-function](../tech_specs/api_core.md#122-todisplaypath-function), [api_core.md#1221-todisplaypath-behavior](../tech_specs/api_core.md#1221-todisplaypath-behavior), [api_core.md#1222-todisplaypath-usage](../tech_specs/api_core.md#1222-todisplaypath-usage) - REQ-CORE-083: Path validation defines invalid path conditions and validation rules [type: constraint]. [api_core.md#123-validatepackagepath-function](../tech_specs/api_core.md#123-validatepackagepath-function), [api_core.md#124-validatepathlength-function](../tech_specs/api_core.md#124-validatepathlength-function) -- REQ-CORE-084: ReadFile method contract defines read-only file reading interface [type: architectural]. [api_core.md#122-packagereaderreadfile-method](../tech_specs/api_core.md#122-packagereaderreadfile-method) +- REQ-CORE-084: ReadFile method contract defines read-only file reading interface [type: architectural]. [api_core.md#122-packagereadfile-method](../tech_specs/api_core.md#122-packagereadfile-method) - REQ-CORE-006: Package interface exposes core package operations [type: architectural]. [api_core.md#11-package-interface](../tech_specs/api_core.md#11-package-interface) -- REQ-CORE-061: Package interface specification defines unified package operations combining reader and writer capabilities [type: architectural]. [api_core.md#11-package-interface](../tech_specs/api_core.md#11-package-interface) +- REQ-CORE-061: Package interface specification defines unified package operations combining read and write capabilities [type: architectural]. [api_core.md#11-package-interface](../tech_specs/api_core.md#11-package-interface) - REQ-CORE-019: Core interfaces define package interface contracts [type: architectural]. [api_core.md#1-core-interfaces](../tech_specs/api_core.md#1-core-interfaces) -- REQ-CORE-047: PackageReader opened package contract defines reader scope [type: architectural]. [api_core.md#121-packagereader-contract](../tech_specs/api_core.md#121-packagereader-contract), [api_core.md#111-filepackage-struct](../tech_specs/api_core.md#111-filepackage-struct) (Exception: also [api_core.md#1111-filepackage-field-descriptions](../tech_specs/api_core.md#1111-filepackage-field-descriptions) for coverage.) -- REQ-CORE-048: OpenPackage eager metadata load loads all required metadata [type: constraint]. [api_core.md#121-packagereader-contract](../tech_specs/api_core.md#121-packagereader-contract) +- REQ-CORE-047: Opened package read operations contract defines reader scope [type: architectural]. [api_core.md#121-read-operations-contract](../tech_specs/api_core.md#121-read-operations-contract), [api_core.md#111-filepackage-struct](../tech_specs/api_core.md#111-filepackage-struct) (Exception: also [api_core.md#1111-filepackage-field-descriptions](../tech_specs/api_core.md#1111-filepackage-field-descriptions) for coverage.) +- REQ-CORE-048: OpenPackage eager metadata load loads all required metadata [type: constraint]. [api_core.md#121-read-operations-contract](../tech_specs/api_core.md#121-read-operations-contract) - REQ-CORE-049: Package path semantics define package-internal path rules [type: constraint]. [api_core.md#2-package-path-semantics](../tech_specs/api_core.md#2-package-path-semantics) [api_core.md#22-path-rules](../tech_specs/api_core.md#22-path-rules) @@ -34,14 +34,14 @@ - REQ-CORE-058: Input paths without leading slash are automatically prefixed during normalization [type: constraint]. [api_core.md#212-leading-slash-requirement](../tech_specs/api_core.md#212-leading-slash-requirement) - REQ-CORE-059: Leading slash indicates package root, not OS filesystem root [type: constraint]. [api_core.md#212-leading-slash-requirement](../tech_specs/api_core.md#212-leading-slash-requirement) - REQ-CORE-060: Display paths MUST strip leading slash before showing to end users [type: constraint]. [api_core.md#23-path-display-and-extraction](../tech_specs/api_core.md#23-path-display-and-extraction) -- REQ-CORE-050: ListFiles returns results sorted by PrimaryPath alphabetically [type: constraint]. [api_core.md#1233-packagereaderlistfiles-behavior](../tech_specs/api_core.md#1233-packagereaderlistfiles-behavior) -- REQ-CORE-051: ListFiles results are stable across calls when package state unchanged [type: constraint]. [api_core.md#1233-packagereaderlistfiles-behavior](../tech_specs/api_core.md#1233-packagereaderlistfiles-behavior) -- REQ-CORE-085: ListFiles purpose defines file information retrieval [type: architectural]. [api_core.md#123-packagereaderlistfiles-method](../tech_specs/api_core.md#123-packagereaderlistfiles-method) -- REQ-CORE-086: ListFiles parameters define pure in-memory operation [type: architectural]. [api_core.md#1231-packagereaderlistfiles-parameters](../tech_specs/api_core.md#1231-packagereaderlistfiles-parameters) -- REQ-CORE-087: ListFiles returns define sorted file information slice [type: architectural]. [api_core.md#1232-packagereaderlistfiles-returns](../tech_specs/api_core.md#1232-packagereaderlistfiles-returns) -- REQ-CORE-088: ListFiles behavior defines sorting, stability, and mutation handling [type: architectural]. [api_core.md#1233-packagereaderlistfiles-behavior](../tech_specs/api_core.md#1233-packagereaderlistfiles-behavior) -- REQ-CORE-089: ListFiles error conditions reference common error mapping table [type: architectural]. [api_core.md#1234-packagereaderlistfiles-error-conditions](../tech_specs/api_core.md#1234-packagereaderlistfiles-error-conditions) -- REQ-CORE-090: ListFiles concurrency defines safe concurrent access [type: architectural]. [api_core.md#1235-packagereaderlistfiles-concurrency](../tech_specs/api_core.md#1235-packagereaderlistfiles-concurrency) +- REQ-CORE-050: ListFiles returns results sorted by PrimaryPath alphabetically [type: constraint]. [api_core.md#1233-packagelistfiles-behavior](../tech_specs/api_core.md#1233-packagelistfiles-behavior) +- REQ-CORE-051: ListFiles results are stable across calls when package state unchanged [type: constraint]. [api_core.md#1233-packagelistfiles-behavior](../tech_specs/api_core.md#1233-packagelistfiles-behavior) +- REQ-CORE-085: ListFiles purpose defines file information retrieval [type: architectural]. [api_core.md#123-packagelistfiles-method](../tech_specs/api_core.md#123-packagelistfiles-method) +- REQ-CORE-086: ListFiles parameters define pure in-memory operation [type: architectural]. [api_core.md#1231-packagelistfiles-parameters](../tech_specs/api_core.md#1231-packagelistfiles-parameters) +- REQ-CORE-087: ListFiles returns define sorted file information slice [type: architectural]. [api_core.md#1232-packagelistfiles-returns](../tech_specs/api_core.md#1232-packagelistfiles-returns) +- REQ-CORE-088: ListFiles behavior defines sorting, stability, and mutation handling [type: architectural]. [api_core.md#1233-packagelistfiles-behavior](../tech_specs/api_core.md#1233-packagelistfiles-behavior) +- REQ-CORE-089: ListFiles error conditions reference common read error mapping table [type: architectural]. [api_core.md#1234-packagelistfiles-error-conditions](../tech_specs/api_core.md#1234-packagelistfiles-error-conditions) +- REQ-CORE-090: ListFiles concurrency defines safe concurrent access [type: architectural]. [api_core.md#1235-packagelistfiles-concurrency](../tech_specs/api_core.md#1235-packagelistfiles-concurrency) - REQ-CORE-091: FileInfo usage patterns demonstrate common FileInfo operations [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#124-fileinfo-structure](../tech_specs/api_core.md#124-fileinfo-structure) - REQ-CORE-092: Basic listing example demonstrates simple file listing [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#1243-example---fileinfo-usage---basic-listing](../tech_specs/api_core.md#1243-example---fileinfo-usage---basic-listing) - REQ-CORE-093: Filter by type example demonstrates type-based filtering [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#1244-example---fileinfo-usage---filter-by-type](../tech_specs/api_core.md#1244-example---fileinfo-usage---filter-by-type) @@ -56,38 +56,38 @@ - REQ-CORE-071: FileInfo includes version tracking for file content and metadata changes [type: constraint]. [api_core.md#1241-fileinfo-field-descriptions](../tech_specs/api_core.md#1241-fileinfo-field-descriptions) - REQ-CORE-072: FileInfo includes metadata indicators for tag presence [type: constraint]. [api_core.md#1241-fileinfo-field-descriptions](../tech_specs/api_core.md#1241-fileinfo-field-descriptions) - REQ-CORE-073: FileInfo balances comprehensiveness with performance for listing operations [type: architectural]. [api_core.md#1242-fileinfo-design-rationale](../tech_specs/api_core.md#1242-fileinfo-design-rationale) -- REQ-CORE-052: GetInfo returns lightweight package information [type: architectural]. [api_core.md#125-packagereadergetinfo-method](../tech_specs/api_core.md#125-packagereadergetinfo-method) -- REQ-CORE-053: GetMetadata returns comprehensive package metadata [type: architectural]. [api_core.md#126-packagereadergetmetadata-method](../tech_specs/api_core.md#126-packagereadergetmetadata-method) +- REQ-CORE-052: GetInfo returns lightweight package information [type: architectural]. [api_core.md#125-packagegetinfo-method](../tech_specs/api_core.md#125-packagegetinfo-method) +- REQ-CORE-053: GetMetadata returns comprehensive package metadata [type: architectural]. [api_core.md#126-packagegetmetadata-method](../tech_specs/api_core.md#126-packagegetmetadata-method) - REQ-CORE-095: Calculate compression ratios example demonstrates compression ratio calculation [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#1246-example---fileinfo-usage---calculate-compression-ratios](../tech_specs/api_core.md#1246-example---fileinfo-usage---calculate-compression-ratios) - REQ-CORE-096: Check for duplicates example demonstrates deduplication by checksum [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#1247-example---fileinfo-usage---check-for-duplicates](../tech_specs/api_core.md#1247-example---fileinfo-usage---check-for-duplicates) - REQ-CORE-097: Verify content integrity example demonstrates checksum verification [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#1248-example---fileinfo-usage---verify-content-integrity](../tech_specs/api_core.md#1248-example---fileinfo-usage---verify-content-integrity) - REQ-CORE-098: Find files with multiple paths example demonstrates multi-path file detection [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#1249-example---fileinfo-usage---find-files-with-multiple-paths](../tech_specs/api_core.md#1249-example---fileinfo-usage---find-files-with-multiple-paths) - REQ-CORE-099: Collection example patterns demonstrate samber/lo usage for collection operations [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#12410-fileinfo-usage---collection-example-patterns](../tech_specs/api_core.md#12410-fileinfo-usage---collection-example-patterns) -- REQ-CORE-100: GetInfo purpose defines lightweight package information retrieval [type: architectural]. [api_core.md#125-packagereadergetinfo-method](../tech_specs/api_core.md#125-packagereadergetinfo-method) -- REQ-CORE-101: GetInfo parameters define pure in-memory operation [type: architectural]. [api_core.md#1251-packagereadergetinfo-parameters](../tech_specs/api_core.md#1251-packagereadergetinfo-parameters) -- REQ-CORE-102: GetInfo returns define lightweight package information structure [type: architectural]. [api_core.md#1252-packagereadergetinfo-returns](../tech_specs/api_core.md#1252-packagereadergetinfo-returns) -- REQ-CORE-103: GetInfo scope defines lightweight view without additional I/O [type: constraint]. [api_core.md#1253-packagereadergetinfo-scope](../tech_specs/api_core.md#1253-packagereadergetinfo-scope) -- REQ-CORE-104: PackageInfo contents define header-derived and computed package-level statistics [type: architectural]. [api_core.md#125-packagereadergetinfo-method](../tech_specs/api_core.md#125-packagereadergetinfo-method) -- REQ-CORE-105: GetInfo does not include individual FileEntry metadata or special metadata file contents [type: constraint]. [api_core.md#125-packagereadergetinfo-method](../tech_specs/api_core.md#125-packagereadergetinfo-method) -- REQ-CORE-106: GetInfo error conditions reference common error mapping table [type: architectural]. [api_core.md#1254-packagereadergetinfo-error-conditions](../tech_specs/api_core.md#1254-packagereadergetinfo-error-conditions) -- REQ-CORE-107: GetInfo concurrency defines safe concurrent access [type: architectural]. [api_core.md#1255-packagereadergetinfo-concurrency](../tech_specs/api_core.md#1255-packagereadergetinfo-concurrency) -- REQ-CORE-108: GetMetadata purpose defines comprehensive metadata retrieval [type: architectural]. [api_core.md#126-packagereadergetmetadata-method](../tech_specs/api_core.md#126-packagereadergetmetadata-method) -- REQ-CORE-109: GetMetadata parameters define pure in-memory operation [type: architectural]. [api_core.md#1261-packagereadergetmetadata-parameters](../tech_specs/api_core.md#1261-packagereadergetmetadata-parameters) -- REQ-CORE-110: GetMetadata returns define comprehensive package metadata structure [type: architectural]. [api_core.md#1262-packagereadergetmetadata-returns](../tech_specs/api_core.md#1262-packagereadergetmetadata-returns) -- REQ-CORE-111: GetMetadata scope defines full metadata view without additional I/O [type: constraint]. [api_core.md#1263-packagereadergetmetadata-scope](../tech_specs/api_core.md#1263-packagereadergetmetadata-scope) -- REQ-CORE-112: PackageMetadata contents reference PackageInfo structure definition [type: architectural]. [api_core.md#126-packagereadergetmetadata-method](../tech_specs/api_core.md#126-packagereadergetmetadata-method) -- REQ-CORE-113: GetMetadata serialization references package information methods [type: architectural]. [api_core.md#1264-packagereadergetmetadata-serialization](../tech_specs/api_core.md#1264-packagereadergetmetadata-serialization) -- REQ-CORE-114: GetMetadata error conditions define internal consistency failure handling [type: constraint]. [api_core.md#1265-packagereadergetmetadata-error-conditions](../tech_specs/api_core.md#1265-packagereadergetmetadata-error-conditions) -- REQ-CORE-115: GetMetadata concurrency defines safe concurrent access [type: architectural]. [api_core.md#1266-packagereadergetmetadata-concurrency](../tech_specs/api_core.md#1266-packagereadergetmetadata-concurrency) -- REQ-CORE-116: Validate method contract defines package validation interface [type: architectural]. [api_core.md#127-packagereadervalidate-method](../tech_specs/api_core.md#127-packagereadervalidate-method) -- REQ-CORE-117: Validate purpose defines package format, structure, and integrity validation [type: architectural]. [api_core.md#127-packagereadervalidate-method](../tech_specs/api_core.md#127-packagereadervalidate-method) -- REQ-CORE-118: Validate parameters define context for cancellation and timeout [type: architectural]. [api_core.md#1271-packagereadervalidate-parameters](../tech_specs/api_core.md#1271-packagereadervalidate-parameters) -- REQ-CORE-119: Validate returns define error return for validation failures [type: architectural]. [api_core.md#1272-packagereadervalidate-returns](../tech_specs/api_core.md#1272-packagereadervalidate-returns) -- REQ-CORE-120: Validate behavior defines validation process [type: architectural]. [api_core.md#1273-packagereadervalidate-behavior](../tech_specs/api_core.md#1273-packagereadervalidate-behavior) -- REQ-CORE-121: Validate error conditions reference common error mapping table [type: architectural]. [api_core.md#1274-packagereadervalidate-error-conditions](../tech_specs/api_core.md#1274-packagereadervalidate-error-conditions) -- REQ-CORE-122: Validate concurrency defines safe concurrent access [type: architectural]. [api_core.md#1275-packagereadervalidate-concurrency](../tech_specs/api_core.md#1275-packagereadervalidate-concurrency) -- REQ-CORE-123: Common error mapping table defines error mapping for all PackageReader methods [type: architectural]. [api_core.md#128-packagereader-common-error-mapping-table](../tech_specs/api_core.md#128-packagereader-common-error-mapping-table) -- REQ-CORE-124: Memory versus disk side effects define PackageWriter write operations [type: architectural]. [api_core.md#131-memory-versus-disk-side-effects](../tech_specs/api_core.md#131-memory-versus-disk-side-effects) +- REQ-CORE-100: GetInfo purpose defines lightweight package information retrieval [type: architectural]. [api_core.md#125-packagegetinfo-method](../tech_specs/api_core.md#125-packagegetinfo-method) +- REQ-CORE-101: GetInfo parameters define pure in-memory operation [type: architectural]. [api_core.md#1251-packagegetinfo-parameters](../tech_specs/api_core.md#1251-packagegetinfo-parameters) +- REQ-CORE-102: GetInfo returns define lightweight package information structure [type: architectural]. [api_core.md#1252-packagegetinfo-returns](../tech_specs/api_core.md#1252-packagegetinfo-returns) +- REQ-CORE-103: GetInfo scope defines lightweight view without additional I/O [type: constraint]. [api_core.md#1253-packagegetinfo-scope](../tech_specs/api_core.md#1253-packagegetinfo-scope) +- REQ-CORE-104: PackageInfo contents define header-derived and computed package-level statistics [type: architectural]. [api_core.md#125-packagegetinfo-method](../tech_specs/api_core.md#125-packagegetinfo-method) +- REQ-CORE-105: GetInfo does not include individual FileEntry metadata or special metadata file contents [type: constraint]. [api_core.md#125-packagegetinfo-method](../tech_specs/api_core.md#125-packagegetinfo-method) +- REQ-CORE-106: GetInfo error conditions reference common read error mapping table [type: architectural]. [api_core.md#1254-packagegetinfo-error-conditions](../tech_specs/api_core.md#1254-packagegetinfo-error-conditions) +- REQ-CORE-107: GetInfo concurrency defines safe concurrent access [type: architectural]. [api_core.md#1255-packagegetinfo-concurrency](../tech_specs/api_core.md#1255-packagegetinfo-concurrency) +- REQ-CORE-108: GetMetadata purpose defines comprehensive metadata retrieval [type: architectural]. [api_core.md#126-packagegetmetadata-method](../tech_specs/api_core.md#126-packagegetmetadata-method) +- REQ-CORE-109: GetMetadata parameters define pure in-memory operation [type: architectural]. [api_core.md#1261-packagegetmetadata-parameters](../tech_specs/api_core.md#1261-packagegetmetadata-parameters) +- REQ-CORE-110: GetMetadata returns define comprehensive package metadata structure [type: architectural]. [api_core.md#1262-packagegetmetadata-returns](../tech_specs/api_core.md#1262-packagegetmetadata-returns) +- REQ-CORE-111: GetMetadata scope defines full metadata view without additional I/O [type: constraint]. [api_core.md#1263-packagegetmetadata-scope](../tech_specs/api_core.md#1263-packagegetmetadata-scope) +- REQ-CORE-112: PackageMetadata contents reference PackageInfo structure definition [type: architectural]. [api_core.md#126-packagegetmetadata-method](../tech_specs/api_core.md#126-packagegetmetadata-method) +- REQ-CORE-113: GetMetadata serialization references package information methods [type: architectural]. [api_core.md#1264-packagegetmetadata-serialization](../tech_specs/api_core.md#1264-packagegetmetadata-serialization) +- REQ-CORE-114: GetMetadata error conditions define internal consistency failure handling [type: constraint]. [api_core.md#1265-packagegetmetadata-error-conditions](../tech_specs/api_core.md#1265-packagegetmetadata-error-conditions) +- REQ-CORE-115: GetMetadata concurrency defines safe concurrent access [type: architectural]. [api_core.md#1266-packagegetmetadata-concurrency](../tech_specs/api_core.md#1266-packagegetmetadata-concurrency) +- REQ-CORE-116: Validate method contract defines package validation interface [type: architectural]. [api_core.md#127-packagevalidate-method](../tech_specs/api_core.md#127-packagevalidate-method) +- REQ-CORE-117: Validate purpose defines package format, structure, and integrity validation [type: architectural]. [api_core.md#127-packagevalidate-method](../tech_specs/api_core.md#127-packagevalidate-method) +- REQ-CORE-118: Validate parameters define context for cancellation and timeout [type: architectural]. [api_core.md#1271-packagevalidate-parameters](../tech_specs/api_core.md#1271-packagevalidate-parameters) +- REQ-CORE-119: Validate returns define error return for validation failures [type: architectural]. [api_core.md#1272-packagevalidate-returns](../tech_specs/api_core.md#1272-packagevalidate-returns) +- REQ-CORE-120: Validate behavior defines validation process [type: architectural]. [api_core.md#1273-packagevalidate-behavior](../tech_specs/api_core.md#1273-packagevalidate-behavior) +- REQ-CORE-121: Validate error conditions reference common read error mapping table [type: architectural]. [api_core.md#1274-packagevalidate-error-conditions](../tech_specs/api_core.md#1274-packagevalidate-error-conditions) +- REQ-CORE-122: Validate concurrency defines safe concurrent access [type: architectural]. [api_core.md#1275-packagevalidate-concurrency](../tech_specs/api_core.md#1275-packagevalidate-concurrency) +- REQ-CORE-123: Common read error mapping table defines error mapping for all package read operations [type: architectural]. [api_core.md#128-common-read-error-mapping-table](../tech_specs/api_core.md#128-common-read-error-mapping-table) +- REQ-CORE-124: Memory versus disk side effects define package write operations [type: architectural]. [api_core.md#131-memory-versus-disk-side-effects](../tech_specs/api_core.md#131-memory-versus-disk-side-effects) - REQ-CORE-125: Write operations define Write, SafeWrite, and FastWrite methods [type: architectural]. [api_core.md#1311-write-operations](../tech_specs/api_core.md#1311-write-operations) - REQ-CORE-126: Write durability defines when changes are written to disk [type: constraint]. [api_core.md#1312-write-durability](../tech_specs/api_core.md#1312-write-durability) - REQ-CORE-127: Target path configuration defines how package target path is configured [type: architectural]. [api_core.md#1313-writing---target-path-configuration](../tech_specs/api_core.md#1313-writing---target-path-configuration) @@ -123,14 +123,14 @@ - REQ-CORE-157: SafeWrite overwrite control requires overwrite flag for existing files [type: constraint]. [api_writing.md#11-packagesafewrite-method](../tech_specs/api_writing.md#11-packagesafewrite-method) - REQ-CORE-158: FastWrite details define in-place update requirements [type: architectural]. [api_writing.md#2-fastwrite---in-place-package-updates](../tech_specs/api_writing.md#2-fastwrite---in-place-package-updates) - REQ-CORE-159: Signed package overwrite restriction prevents overwriting signed packages [type: constraint]. [api_writing.md#44-signed-package-writing-error-conditions](../tech_specs/api_writing.md#44-signed-package-writing-error-conditions) -- REQ-CORE-160: Common writer error mapping table defines error mapping for all PackageWriter methods [type: architectural]. [api_core.md#132-common-writer-error-mapping-table](../tech_specs/api_core.md#132-common-writer-error-mapping-table) -- REQ-CORE-074: ReadFile purpose is to read file content from package as bytes. [api_core.md#122-packagereaderreadfile-method](../tech_specs/api_core.md#122-packagereaderreadfile-method) -- REQ-CORE-075: ReadFile parameters include context and path. [api_core.md#1221-packagereaderreadfile-parameters](../tech_specs/api_core.md#1221-packagereaderreadfile-parameters) -- REQ-CORE-076: ReadFile returns file content as byte slice. [api_core.md#1222-packagereaderreadfile-returns](../tech_specs/api_core.md#1222-packagereaderreadfile-returns) -- REQ-CORE-077: ReadFile behavior includes location, decompression, and decryption, and obeys encryption and validation. [api_core.md#1223-packagereaderreadfile-behavior](../tech_specs/api_core.md#1223-packagereaderreadfile-behavior) -- REQ-CORE-078: ReadFile error conditions handle missing files and processing errors. [api_core.md#1224-packagereaderreadfile-error-conditions](../tech_specs/api_core.md#1224-packagereaderreadfile-error-conditions) -- REQ-CORE-079: ReadFile is safe for concurrent reads from different goroutines. [api_core.md#1225-packagereaderreadfile-concurrency](../tech_specs/api_core.md#1225-packagereaderreadfile-concurrency) -- REQ-CORE-054: PackageWriter memory versus disk side effects are clearly defined [type: architectural]. [api_core.md#131-memory-versus-disk-side-effects](../tech_specs/api_core.md#131-memory-versus-disk-side-effects) +- REQ-CORE-160: Common writer error mapping table defines error mapping for all package write operations [type: architectural]. [api_core.md#132-common-writer-error-mapping-table](../tech_specs/api_core.md#132-common-writer-error-mapping-table) +- REQ-CORE-074: ReadFile purpose is to read file content from package as bytes. [api_core.md#122-packagereadfile-method](../tech_specs/api_core.md#122-packagereadfile-method) +- REQ-CORE-075: ReadFile parameters include context and path. [api_core.md#1221-packagereadfile-parameters](../tech_specs/api_core.md#1221-packagereadfile-parameters) +- REQ-CORE-076: ReadFile returns file content as byte slice. [api_core.md#1222-packagereadfile-returns](../tech_specs/api_core.md#1222-packagereadfile-returns) +- REQ-CORE-077: ReadFile behavior includes location, decompression, and decryption, and obeys encryption and validation. [api_core.md#1223-packagereadfile-behavior](../tech_specs/api_core.md#1223-packagereadfile-behavior) +- REQ-CORE-078: ReadFile error conditions handle missing files and processing errors. [api_core.md#1224-packagereadfile-error-conditions](../tech_specs/api_core.md#1224-packagereadfile-error-conditions) +- REQ-CORE-079: ReadFile is safe for concurrent reads from different goroutines. [api_core.md#1225-packagereadfile-concurrency](../tech_specs/api_core.md#1225-packagereadfile-concurrency) +- REQ-CORE-054: Package write memory versus disk side effects are clearly defined [type: architectural]. [api_core.md#131-memory-versus-disk-side-effects](../tech_specs/api_core.md#131-memory-versus-disk-side-effects) - ~~REQ-CORE-055: StageFile option precedence rules define option handling~~ [type: obsolete] (obsolete: StageFile replaced by AddFile/AddFileFromMemory high-level methods). [api_core.md#1273-stagefile-option-precedence](../tech_specs/api_core.md#1273-stagefile-option-precedence) - REQ-CORE-056: Allowed target paths and overwrite behavior define write restrictions [type: constraint]. [api_writing.md#44-signed-package-writing-error-conditions](../tech_specs/api_writing.md#44-signed-package-writing-error-conditions) @@ -179,7 +179,7 @@ - REQ-CORE-181: OldContext structure defines source context for error transformation [type: architectural]. [api_core.md#1056-maperror-function](../tech_specs/api_core.md#1056-maperror-function) - REQ-CORE-182: NewContext structure defines target context for error transformation [type: architectural]. [api_core.md#1056-maperror-function](../tech_specs/api_core.md#1056-maperror-function) - REQ-CORE-183: Transformation usage demonstrates error context transformation with MapError [type: documentation-only] (documentation-only: examples - DO NOT CREATE FEATURE FILE). [api_core.md#1056-maperror-function](../tech_specs/api_core.md#1056-maperror-function) -- REQ-CORE-184: PackageWriter structured error system uses structured errors exclusively for all PackageWriter methods [type: architectural]. [api_core.md#10-structured-error-system](../tech_specs/api_core.md#10-structured-error-system) +- REQ-CORE-184: Package write operations use structured errors exclusively for all write methods [type: architectural]. [api_core.md#10-structured-error-system](../tech_specs/api_core.md#10-structured-error-system) ## Context Integration diff --git a/docs/requirements/metadata.md b/docs/requirements/metadata.md index c582a8da..8308e64a 100644 --- a/docs/requirements/metadata.md +++ b/docs/requirements/metadata.md @@ -51,7 +51,7 @@ - REQ-META-126: DestPath and DestPathWin support relative paths (resolved from default extraction directory) and absolute paths. [api_metadata.md#81-pathmetadata-structures](../tech_specs/api_metadata.md#81-pathmetadata-structures) - REQ-META-127: DestPathWin is used for Windows-specific destination paths; if only DestPath is absolute on Windows, root is treated as C:\\. [api_metadata.md#81-pathmetadata-structures](../tech_specs/api_metadata.md#81-pathmetadata-structures) - REQ-META-128: SetDestPath sets destination extraction overrides for a stored path, creating PathMetadataEntry if missing. [api_metadata.md#8216-packagesetdestpath-method](../tech_specs/api_metadata.md#8216-packagesetdestpath-method) -- REQ-META-129: SetDestPath accepts DestPathOverride struct with optional DestPath and DestPathWin pointers. [api_metadata.md#8111-destpathoverride-structure](../tech_specs/api_metadata.md#8111-destpathoverride-structure) +- REQ-META-129: SetDestPath accepts DestPathOverride struct with optional DestPath and DestPathWin pointers. [api_metadata.md#8112-destpathoverride-structure](../tech_specs/api_metadata.md#8112-destpathoverride-structure) - REQ-META-130: SetDestPathTyped is a generic helper that accepts string or map[string]string input and converts to DestPathOverride. [api_metadata.md#8216-packagesetdestpath-method](../tech_specs/api_metadata.md#8216-packagesetdestpath-method) - REQ-META-131: SetDestPath normalizes storedPath by prefixing leading slash if missing before matching. [api_metadata.md#8216-packagesetdestpath-method](../tech_specs/api_metadata.md#8216-packagesetdestpath-method) - REQ-META-132: SetDestPath parses string input to determine Windows-only paths (drive letter or UNC) and stores in appropriate field. [api_metadata.md#8216-packagesetdestpath-method](../tech_specs/api_metadata.md#8216-packagesetdestpath-method) diff --git a/docs/tech_specs/api_basic_operations.md b/docs/tech_specs/api_basic_operations.md index 4f8d62f6..c61ff18f 100644 --- a/docs/tech_specs/api_basic_operations.md +++ b/docs/tech_specs/api_basic_operations.md @@ -6,22 +6,22 @@ - [0. Overview](#0-overview) - [0.1 Cross-References](#01-cross-references) - [1. Context Integration](#1-context-integration) -- [2. Go API v1 Package Organization](#2-go-api-v1-package-organization) +- [2. Go API V1 Package Organization](#2-go-api-v1-package-organization) - [2.1 Module Path](#21-module-path) - [2.1.1 API Package Structure](#211-api-package-structure) - [2.1.2 Root Package Purpose](#212-root-package-purpose) - [2.1.3 Root Package Example Import](#213-root-package-example-import) - - [2.1.4 File Format Package: `fileformat`](#214-file-format-package-fileformat) + - [2.1.4. File Format Package: Fileformat](#214-file-format-package-fileformat) - [2.1.5 File Format Package Key Types](#215-file-format-package-key-types) - [2.1.6 File Format Package Example Import](#216-file-format-package-example-import) - [2.1.7 Metadata Package Purpose](#217-metadata-package-purpose) - [2.1.8 Metadata Package Imports](#218-metadata-package-imports) - - [2.1.9 Generics Package: `generics`](#219-generics-package-generics) + - [2.1.9. Generics Package: Generics](#219-generics-package-generics) - [2.1.10 Generics Package Key Types](#2110-generics-package-key-types) - [2.1.11 Generics Package Example Import](#2111-generics-package-example-import) - [2.1.12 Error Handling Package Purpose](#2112-error-handling-package-purpose) - [2.1.13 Error Handling Package Imports](#2113-error-handling-package-imports) - - [2.1.14 Signatures Package: `signatures`](#2114-signatures-package-signatures) + - [2.1.14. Signatures Package: Signatures](#2114-signatures-package-signatures) - [2.1.15 Signatures Package Key Types](#2115-signatures-package-key-types) - [2.1.16 Signatures Package Example Import](#2116-signatures-package-example-import) - [2.1.17 Internal Package Purpose](#2117-internal-package-purpose) @@ -34,9 +34,9 @@ - [3. Package Structure and Loading](#3-package-structure-and-loading) - [3.1 Package Implementation Structure](#31-package-implementation-structure) - [3.2 Package Loading Process](#32-package-loading-process) - - [3.2.1 `Package.loadSpecialMetadataFiles` Method](#321-packageloadspecialmetadatafiles-method) - - [3.2.2 `Package.loadPathMetadata` Method](#322-packageloadpathmetadata-method) - - [3.2.3 `Package.updateFilePathAssociations` Method](#323-packageupdatefilepathassociations-method) + - [3.2.1 Package.loadSpecialMetadataFiles Method](#321-packageloadspecialmetadatafiles-method) + - [3.2.2 Package.loadPathMetadata Method](#322-packageloadpathmetadata-method) + - [3.2.3 Package.updateFilePathAssociations Method](#323-packageupdatefilepathassociations-method) - [3.3 Package Implementation Details](#33-package-implementation-details) - [3.3.1 Package Structure Implementation](#331-package-structure-implementation) - [3.3.2 Data Loading Strategies](#332-data-loading-strategies) @@ -50,96 +50,96 @@ - [5. Package Lifecycle Operations](#5-package-lifecycle-operations) - [5.1 Package Lifecycle - Always Use Defer for Cleanup](#51-package-lifecycle---always-use-defer-for-cleanup) - [5.2 Package Lifecycle - Check Package State Before Operations](#52-package-lifecycle---check-package-state-before-operations) - - [5.3 Package Lifecycle - Use appropriate context timeouts](#53-package-lifecycle---use-appropriate-context-timeouts) -- [6. `NewPackage` Constructor Function](#6-newpackage-function) + - [5.3. Package Lifecycle - Use Appropriate Context Timeouts](#53-package-lifecycle---use-appropriate-context-timeouts) +- [6. NewPackage Function](#6-newpackage-function) - [6.1 NewPackage Behavior](#61-newpackage-behavior) - [6.2 NewPackage Example Usage](#62-newpackage-example-usage) -- [7. `NewPackageWithOptions` Constructor Function](#7-newpackagewithoptions-function) +- [7. NewPackageWithOptions Function](#7-newpackagewithoptions-function) - [7.1 NewPackageWithOptions Parameters](#71-newpackagewithoptions-parameters) - [7.2 NewPackageWithOptions Behavior](#72-newpackagewithoptions-behavior) - [7.3 NewPackageWithOptions Error Conditions](#73-newpackagewithoptions-error-conditions) - [7.4 NewPackageWithOptions Example Usage](#74-newpackagewithoptions-example-usage) - [7.5 NewPackageWithOptions Without Path](#75-newpackagewithoptions-without-path) - - [7.6 `CreateOptions` Structure](#76-createoptions-structure) -- [8. `Package.SetTargetPath` Method](#8-packagesettargetpath-method) + - [7.6 CreateOptions Structure](#76-createoptions-structure) +- [8. Package.SetTargetPath Method](#8-packagesettargetpath-method) - [8.1 Package.SetTargetPath Parameters](#81-packagesettargetpath-parameters) - [8.2 Package.SetTargetPath Behavior](#82-packagesettargetpath-behavior) - [8.3 Package.SetTargetPath Method Error Conditions](#83-packagesettargetpath-method-error-conditions) - [8.4 Package.SetTargetPath Example Usage](#84-packagesettargetpath-example-usage) - - [8.5 Package.SetTargetPath vs NewPackageWithOptions](#85-packagesettargetpath-vs-newpackagewithoptions) + - [8.5. Package.SetTargetPath vs NewPackageWithOptions](#85-packagesettargetpath-vs-newpackagewithoptions) - [9. Package Configuration](#9-package-configuration) - - [9.1 `PackageConfig` Structure](#91-packageconfig-structure) + - [9.1 PackageConfig Structure](#91-packageconfig-structure) - [9.1.1 PackageConfig Fields](#911-packageconfig-fields) - [9.2 PackageConfig Backward Compatibility](#92-packageconfig-backward-compatibility) - - [9.3 `PathHandling` Type](#93-pathhandling-type) -- [10. `OpenPackage`](#10-openpackage-function) + - [9.3 PathHandling Type](#93-pathhandling-type) +- [10. OpenPackage Function](#10-openpackage-function) - [10.1 OpenPackage Parameters](#101-openpackage-parameters) - [10.2 OpenPackage Behavior](#102-openpackage-behavior) - [10.3 OpenPackage Method Error Conditions](#103-openpackage-method-error-conditions) - [10.3.1 OpenPackage Example Usage](#1031-openpackage-example-usage) -- [11. Opening Packages as Read-Only](#11-opening-packages-as-read-only) +- [11. Opening Packages As Read-Only](#11-opening-packages-as-read-only) - [11.1 Read-Only Enforcement Mechanism](#111-read-only-enforcement-mechanism) - [11.1.1 Mutating Methods That Must Be Rejected](#1111-mutating-methods-that-must-be-rejected) - - [11.2 `OpenPackageReadOnly` Function](#112-openpackagereadonly-function) + - [11.2 OpenPackageReadOnly Function](#112-openpackagereadonly-function) - [11.2.1 OpenPackageReadOnly Behavior](#1121-openpackagereadonly-behavior) - [11.2.2 OpenPackageReadOnly Method Error Conditions](#1122-openpackagereadonly-method-error-conditions) - - [11.3 `readOnlyPackage` Structure](#113-readonlypackage-struct) - - [11.4 `readOnlyPackage.readOnlyError` Helper](#114-readonlypackagereadonlyerror-method) - - [11.5 `ReadOnlyErrorContext` Structure](#115-readonlyerrorcontext-structure) - - [11.6 `readOnlyPackage` Implementation Methods](#116-readonlypackage-implementation-methods) -- [12. `OpenBrokenPackage` Function](#12-openbrokenpackage-function) -- [13. `Package.Close` Method](#13-packageclose-method) - - [13.1 `Package.Close` Behavior](#131-packageclose-behavior) - - [13.2 `Package.Close` Method Error Conditions](#132-packageclose-method-error-conditions) - - [13.3 `Package.Close` Example Usage](#133-packageclose-example-usage) -- [14. `Package.CloseWithCleanup` Method](#14-packageclosewithcleanup-method) - - [14.1 `Package.CloseWithCleanup` Behavior](#141-packageclosewithcleanup-behavior) -- [15. `Package.Validate` Method](#15-packagevalidate-method) - - [15.1 `Package.Validate` Behavior](#151-packagevalidate-behavior) - - [15.2 `Package.Validate` Method Error Conditions](#152-packagevalidate-method-error-conditions) - - [15.3 `Package.Validate` Example Usage](#153-packagevalidate-example-usage) -- [16. `Package.Defragment` Method](#16-packagedefragment-method) - - [16.1 `Package.Defragment` Behavior](#161-packagedefragment-behavior) - - [16.2 `Package.Defragment` Error Conditions](#162-packagedefragment-error-conditions) - - [16.3 `Package.Defragment` Example Usage](#163-packagedefragment-example-usage) -- [17. `Package.GetInfo` Method](#17-packagegetinfo-method) - - [17.1 `Package.GetInfo` Error Conditions](#171-packagegetinfo-error-conditions) - - [17.2 `Package.GetInfo` Example Usage](#172-packagegetinfo-example-usage) + - [11.3 readOnlyPackage Struct](#113-readonlypackage-struct) + - [11.4 readOnlyPackage.readOnlyError Method](#114-readonlypackagereadonlyerror-method) + - [11.5 ReadOnlyErrorContext Structure](#115-readonlyerrorcontext-structure) + - [11.6. ReadOnlyPackage Implementation Methods](#116-readonlypackage-implementation-methods) +- [12. OpenBrokenPackage Function](#12-openbrokenpackage-function) +- [13. Package.Close Method](#13-packageclose-method) + - [13.1 Package.Close Behavior](#131-packageclose-behavior) + - [13.2 Package.Close Method Error Conditions](#132-packageclose-method-error-conditions) + - [13.3 Package.Close Example Usage](#133-packageclose-example-usage) +- [14. Package.CloseWithCleanup Method](#14-packageclosewithcleanup-method) + - [14.1 Package.CloseWithCleanup Behavior](#141-packageclosewithcleanup-behavior) +- [15. Package.Validate Method](#15-packagevalidate-method) + - [15.1 Package.Validate Behavior](#151-packagevalidate-behavior) + - [15.2 Package.Validate Method Error Conditions](#152-packagevalidate-method-error-conditions) + - [15.3 Package.Validate Example Usage](#153-packagevalidate-example-usage) +- [16. Package.Defragment Method](#16-packagedefragment-method) + - [16.1 Package.Defragment Behavior](#161-packagedefragment-behavior) + - [16.2 Package.Defragment Error Conditions](#162-packagedefragment-error-conditions) + - [16.3 Package.Defragment Example Usage](#163-packagedefragment-example-usage) +- [17. Package.GetInfo Method](#17-packagegetinfo-method) + - [17.1 Package.GetInfo Error Conditions](#171-packagegetinfo-error-conditions) + - [17.2 Package.GetInfo Example Usage](#172-packagegetinfo-example-usage) - [18. Header Inspection](#18-header-inspection) - [18.1 Header Inspection Use Cases](#181-header-inspection-use-cases) - - [18.2 ReadHeader vs ReadHeaderFromPath](#182-readheader-vs-readheaderfrompath) - - [18.3 `ReadHeader` Function](#183-readheader-function) - - [18.4 `ReadHeaderFromPath` Function](#184-readheaderfrompath-function) - - [18.4.1 `ReadHeaderFromPath` Parameters](#1841-readheaderfrompath-parameters) - - [18.4.2 `ReadHeaderFromPath` Error Conditions](#1842-readheaderfrompath-error-conditions) - - [18.4.3 `ReadHeaderFromPath` Example Usage](#1843-readheaderfrompath-example-usage) - - [18.5 `Package.ReadHeader` Method](#185-packagereadheader-method) - - [18.5.1 `Package.ReadHeader` Parameters](#1851-packagereadheader-parameters) - - [18.5.2 `Package.ReadHeader` Error Conditions](#1852-packagereadheader-error-conditions) - - [18.6 `Package.IsOpen` Method](#186-packageisopen-method) - - [18.7 `Package.IsReadOnly` Method](#187-packageisreadonly-method) - - [18.8 `Package.GetPath` Method](#188-packagegetpath-method) + - [18.2. ReadHeader vs ReadHeaderFromPath](#182-readheader-vs-readheaderfrompath) + - [18.3 ReadHeader Function](#183-readheader-function) + - [18.4 ReadHeaderFromPath Function](#184-readheaderfrompath-function) + - [18.4.1 ReadHeaderFromPath Parameters](#1841-readheaderfrompath-parameters) + - [18.4.2 ReadHeaderFromPath Error Conditions](#1842-readheaderfrompath-error-conditions) + - [18.4.3 ReadHeaderFromPath Example Usage](#1843-readheaderfrompath-example-usage) + - [18.5 Package.ReadHeader Method](#185-packagereadheader-method) + - [18.5.1 Package.ReadHeader Parameters](#1851-packagereadheader-parameters) + - [18.5.2 Package.ReadHeader Error Conditions](#1852-packagereadheader-error-conditions) + - [18.6 Package.IsOpen Method](#186-packageisopen-method) + - [18.7 Package.IsReadOnly Method](#187-packageisreadonly-method) + - [18.8 Package.GetPath Method](#188-packagegetpath-method) - [19. Package Session Base Management](#19-package-session-base-management) - [19.1 Session Base for File Addition](#191-session-base-for-file-addition) - [19.2 Session Base for File Extraction](#192-session-base-for-file-extraction) - [19.3 Session Base Lifecycle](#193-session-base-lifecycle) - - [19.4 `Package.SetSessionBase` Method](#194-packagesetsessionbase-method) - - [19.4.1 `Package.SetSessionBase` Parameters](#1941-packagesetsessionbase-parameters) + - [19.4 Package.SetSessionBase Method](#194-packagesetsessionbase-method) + - [19.4.1 Package.SetSessionBase Parameters](#1941-packagesetsessionbase-parameters) - [19.4.2 Package.SetSessionBase Returns](#1942-packagesetsessionbase-returns) - [19.4.3 Package.SetSessionBase Example Usage](#1943-packagesetsessionbase-example-usage) - [19.5 Package.GetSessionBase Method](#195-packagegetsessionbase-method) - [19.5.1 Package.GetSessionBase Returns](#1951-packagegetsessionbase-returns) - [19.5.2 Package.GetSessionBase Example](#1952-packagegetsessionbase-example) - - [19.6 `Package.ClearSessionBase` Method](#196-packageclearsessionbase-method) + - [19.6 Package.ClearSessionBase Method](#196-packageclearsessionbase-method) - [19.6.1 Package.ClearSessionBase Example](#1961-packageclearsessionbase-example) - [19.7 Package.HasSessionBase Method](#197-packagehassessionbase-method) - [19.7.1 Package.HasSessionBase Returns](#1971-packagehassessionbase-returns) - [19.7.2 Package.HasSessionBase Example](#1972-packagehassessionbase-example) - [20. Structured Error System](#20-structured-error-system) - [20.1 Error Types Used](#201-error-types-used) - - [20.2 `PackageErrorContext` Structure](#202-packageerrorcontext-structure) - - [20.3 `SecurityErrorContext` Structure](#203-securityerrorcontext-structure) - - [20.4 `IOErrorContext` Structure](#204-ioerrorcontext-structure) + - [20.2 PackageErrorContext Structure](#202-packageerrorcontext-structure) + - [20.3 SecurityErrorContext Structure](#203-securityerrorcontext-structure) + - [20.4 IOErrorContext Structure](#204-ioerrorcontext-structure) - [20.5 Creating Errors with Context](#205-creating-errors-with-context) - [20.6 Error Inspection](#206-error-inspection) - [21. Error Handling Best Practices](#21-error-handling-best-practices) @@ -452,7 +452,7 @@ The `Package` interface provides the public API for package operations, while `f The canonical `Package` interface definition is specified in [Core Package Interface API - Package Interface](api_core.md#11-package-interface). -The `Package` interface provides a unified API that combines PackageReader, PackageWriter, lifecycle operations, file management, metadata operations, compression operations, and session base management. +The `Package` interface provides a unified API that combines read operations, write operations, lifecycle operations, file management, metadata operations, compression operations, and session base management. For session base management (used for both file addition and extraction operations), see [Package Session Base Management](#19-package-session-base-management). @@ -1053,7 +1053,7 @@ if err != nil { } ``` -### 8.5. Package.SetTargetPath Vs NewPackageWithOptions +### 8.5. Package.SetTargetPath vs NewPackageWithOptions - `NewPackageWithOptions`: Used for initial package creation with configuration options, including optional path - `SetTargetPath`: Used to change the write path on an existing package (created or opened) @@ -1194,7 +1194,7 @@ This can be achieved by returning a distinct wrapper type as the dynamic type be The wrapper must reject all methods that mutate package state in memory or write to disk. -This includes all PackageWriter methods, all state-changing metadata setters, and lifecycle methods that change the target path or package configuration for writing. +This includes all package write operations (Write, SafeWrite, FastWrite), all state-changing metadata setters, and lifecycle methods that change the target path or package configuration for writing. At minimum, the wrapper must reject Create, SetTargetPath, Defragment, AddFile, AddFileFromMemory, AddFilePattern, AddDirectory, RemoveFile, RemoveFilePattern, Write, SafeWrite, FastWrite, SetComment, ClearComment, SetAppID, ClearAppID, SetVendorID, ClearVendorID, SetPackageIdentity, and ClearPackageIdentity. @@ -1436,10 +1436,7 @@ if err != nil { ## 17. Package.GetInfo Method -```go -// GetInfo gets basic package information -func (p *Package) GetInfo() (*PackageInfo, error) -``` +Note: The canonical signature for `GetInfo` is defined in [Core Package Interface API - Package.GetInfo](api_core.md#125-packagegetinfo-method). This function retrieves comprehensive information about the current package. @@ -1480,7 +1477,7 @@ These are low-level functions for header-only inspection without opening the ful - Stream processing where only header information is needed - Quick validation of package files without full I/O overhead -### 18.2. ReadHeader Vs ReadHeaderFromPath +### 18.2. ReadHeader vs ReadHeaderFromPath - `ReadHeader`: Use when you have an existing `io.Reader` or need fine-grained control over file operations - `ReadHeaderFromPath`: Use when you want a simple, one-line header read from a file path with automatic file management diff --git a/docs/tech_specs/api_core.md b/docs/tech_specs/api_core.md index 7a0746b2..2174807e 100644 --- a/docs/tech_specs/api_core.md +++ b/docs/tech_specs/api_core.md @@ -8,16 +8,16 @@ - [1.1.1 filePackage Struct](#111-filepackage-struct) - [1.1.2 Basic Operations](#112-basic-operations) - [1.1.3 File Management Operations](#113-file-management-operations) - - [1.2 PackageReader Interface](#12-packagereader-interface) - - [1.2.1 PackageReader Contract](#121-packagereader-contract) - - [1.2.2 PackageReader.ReadFile Method](#122-packagereaderreadfile-method) - - [1.2.3 PackageReader.ListFiles Method](#123-packagereaderlistfiles-method) + - [1.2 Package Read Operations](#12-package-read-operations) + - [1.2.1 Read Operations Contract](#121-read-operations-contract) + - [1.2.2 Package.ReadFile Method](#122-packagereadfile-method) + - [1.2.3 Package.ListFiles Method](#123-packagelistfiles-method) - [1.2.4 FileInfo Structure](#124-fileinfo-structure) - - [1.2.5 PackageReader.GetInfo Method](#125-packagereadergetinfo-method) - - [1.2.6 PackageReader.GetMetadata Method](#126-packagereadergetmetadata-method) - - [1.2.7 PackageReader.Validate Method](#127-packagereadervalidate-method) - - [1.2.8 PackageReader Common Error Mapping Table](#128-packagereader-common-error-mapping-table) - - [1.3 PackageWriter Interface](#13-packagewriter-interface) + - [1.2.5 Package.GetInfo Method](#125-packagegetinfo-method) + - [1.2.6 Package.GetMetadata Method](#126-packagegetmetadata-method) + - [1.2.7 Package.Validate Method](#127-packagevalidate-method) + - [1.2.8 Common Read Error Mapping Table](#128-common-read-error-mapping-table) + - [1.3 Package Write Operations](#13-package-write-operations) - [1.3.1 Memory Versus Disk Side Effects](#131-memory-versus-disk-side-effects) - [1.3.2 Common Writer Error Mapping Table](#132-common-writer-error-mapping-table) - [2. Package Path Semantics](#2-package-path-semantics) @@ -126,27 +126,29 @@ The NovusPack API is designed around core interfaces that provide clear separati The `Package` interface provides a unified API that combines: -- **PackageReader** methods (embedded) - Read-only operations on opened packages -- **PackageWriter** methods (embedded) - Write operations to persist package changes +- **Read operations** - Read-only operations on opened packages (ReadFile, ListFiles, GetInfo, GetMetadata, Validate) +- **Write operations** - Write operations to persist package changes (Write, SafeWrite, FastWrite) - **Lifecycle operations** - Package creation, opening, closing, and state management - **File management operations** - Adding, removing, and extracting files (see [File Management API](api_file_mgmt_index.md)) - **Metadata operations** - Comment, AppID, and VendorID management (see [Package Metadata API](api_metadata.md)) - **Compression operations** - Package-level compression and decompression (see [Package Compression API](api_package_compression.md)) - **Session base management** - Automatic path derivation for file operations -**Note on Embedded Interface Methods in Go**: When `Package` embeds `PackageReader` and `PackageWriter`, the methods from those interfaces become part of the `Package` interface. A single implementation satisfies both the embedded interface and the `Package` interface. -For example, `Package.Validate` and `PackageReader.Validate` are the same method - there is no delegation or wrapper. -The concrete `filePackage` type implements one `Validate` method that satisfies both interfaces. - ```go // Package defines the main interface for NovusPack package operations. -// Package combines PackageReader and PackageWriter interfaces, providing -// complete package lifecycle management including opening, closing, and -// defragmentation operations. +// Package provides the v1 package API surface for both read and write operations. type Package interface { - // Embedded interfaces - PackageReader - PackageWriter + // Read operations (opened package) + ReadFile(ctx context.Context, path string) ([]byte, error) + ListFiles() ([]FileInfo, error) + GetMetadata() (*PackageMetadata, error) + Validate(ctx context.Context) error + GetInfo() (*PackageInfo, error) + + // Write operations (persist to disk) + Write(ctx context.Context) error + SafeWrite(ctx context.Context, overwrite bool) error + FastWrite(ctx context.Context) error // Lifecycle operations SetTargetPath(ctx context.Context, path string) error @@ -325,35 +327,24 @@ File management operations (adding, removing files): For complete documentation, see [File Management API](api_file_mgmt_index.md). -### 1.2 PackageReader Interface +### 1.2 Package Read Operations -```go -// PackageReader defines the interface for reading operations on a package. -// PackageReader provides methods for reading files, listing files, retrieving -// metadata, and validating package contents. -type PackageReader interface { - ReadFile(ctx context.Context, path string) ([]byte, error) - ListFiles() ([]FileInfo, error) - GetMetadata() (*PackageMetadata, error) - Validate(ctx context.Context) error - GetInfo() (*PackageInfo, error) -} -``` +This section specifies the read operations that are part of the `Package` interface. +These operations were previously documented as a separate read-only interface. -#### 1.2.1 PackageReader Contract +#### 1.2.1 Read Operations Contract -`PackageReader` is the read-only interface for opened packages. -It is embedded in the larger `Package` interface and describes read-only operations available on an opened package instance. +The package read operations are available for opened packages. ##### 1.2.1.1 Reader Contract Scope -`PackageReader` methods assume the package has already been opened (via `OpenPackage` or equivalent). -`PackageReader` is not intended to represent header-only or lightweight on-disk inspection. -Header-only inspection is handled by separate functions (for example, `ReadHeader` in [Basic Operations API](api_basic_operations.md)) rather than by a special `PackageReader` implementation. +Package read operations assume the package has already been opened (via `OpenPackage` or equivalent). +These operations are not intended to represent header-only or lightweight on-disk inspection. +Header-only inspection is handled by separate functions (for example, `ReadHeader` in [Basic Operations API](api_basic_operations.md)) rather than by a separate reader interface or wrapper implementation. ##### 1.2.1.2 OpenPackage Eager Metadata Load -`OpenPackage` MUST read into memory all package metadata required for `PackageReader` operations, including: +`OpenPackage` MUST read into memory all package metadata required for package read operations, including: - Package header - File index @@ -379,7 +370,7 @@ Operations that perform I/O retain `context.Context`: ##### 1.2.1.4 Unsupported Operations -`PackageReader` should not include methods that are only meaningful for un-opened, on-disk inspection. +Package read operations should not include methods that are only meaningful for un-opened, on-disk inspection. ##### 1.2.1.5 Code Reuse Requirement @@ -390,23 +381,26 @@ For example, `internal.ReadAndValidateHeader` is shared by both `ReadHeader` (st See [ReadHeader](api_basic_operations.md) function as an example of a lightweight operation for header-only inspection. -#### 1.2.2 PackageReader.ReadFile Method +#### 1.2.2 Package.ReadFile Method Reads file content from the package, applying decryption and decompression as needed. -The canonical signature for `ReadFile` is defined in the [PackageReader Interface](#12-packagereader-interface). +```go +// ReadFile reads file content from the package, applying decryption and decompression. +func (p *Package) ReadFile(ctx context.Context, path string) ([]byte, error) +``` -##### 1.2.2.1 PackageReader.ReadFile Parameters +##### 1.2.2.1 Package.ReadFile Parameters - `ctx context.Context` - Context for cancellation and timeout handling - `path string` - Package-internal path to the file (see [Package Path Semantics](#2-package-path-semantics)) -##### 1.2.2.2 PackageReader.ReadFile Returns +##### 1.2.2.2 Package.ReadFile Returns - `[]byte` - File content (decrypted and decompressed) - `error` - Returns `*PackageError` on failure -##### 1.2.2.3 PackageReader.ReadFile Behavior +##### 1.2.2.3 Package.ReadFile Behavior - Reads file data from disk - Locates the FileEntry by normalized package path. @@ -416,30 +410,30 @@ The canonical signature for `ReadFile` is defined in the [PackageReader Interfac - Applies decompression if the file is compressed - Returns decrypted and decompressed content -##### 1.2.2.4 PackageReader.ReadFile Error Conditions +##### 1.2.2.4 Package.ReadFile Error Conditions -See [Common Error Mapping Table](#128-packagereader-common-error-mapping-table). +See [Common Read Error Mapping Table](#128-common-read-error-mapping-table). -##### 1.2.2.5 PackageReader.ReadFile Concurrency +##### 1.2.2.5 Package.ReadFile Concurrency Safe for concurrent reads from different goroutines. -#### 1.2.3 PackageReader.ListFiles Method +#### 1.2.3 Package.ListFiles Method Returns information about all files in the package. -The canonical signature for `ListFiles` is defined in the [PackageReader Interface](#12-packagereader-interface). +Note: The canonical signature for `ListFiles` is defined in [File Query and Inspection Operations - Package.ListFiles](api_file_mgmt_queries.md#112-packagelistfiles-method). -##### 1.2.3.1 PackageReader.ListFiles Parameters +##### 1.2.3.1 Package.ListFiles Parameters None (pure in-memory operation). -##### 1.2.3.2 PackageReader.ListFiles Returns +##### 1.2.3.2 Package.ListFiles Returns - `[]FileInfo` - Slice of file information, sorted by PrimaryPath alphabetically - `error` - Returns `*PackageError` on failure -##### 1.2.3.3 PackageReader.ListFiles Behavior +##### 1.2.3.3 Package.ListFiles Behavior - Results MUST be sorted by PrimaryPath (normalized package path), alphabetically - Results MUST be stable across calls when the in-memory package state has not changed @@ -447,11 +441,11 @@ None (pure in-memory operation). - In-memory mutations (for example, via `AddFile` or `RemoveFile`) affect `ListFiles` results immediately, even before a write operation - For files with multiple paths, PrimaryPath is the first path (lexicographically) and all paths appear in the Paths array -##### 1.2.3.4 PackageReader.ListFiles Error Conditions +##### 1.2.3.4 Package.ListFiles Error Conditions -See [Common Error Mapping Table](#128-packagereader-common-error-mapping-table). +See [Common Read Error Mapping Table](#128-common-read-error-mapping-table). -##### 1.2.3.5 PackageReader.ListFiles Concurrency +##### 1.2.3.5 Package.ListFiles Concurrency Safe for concurrent calls from different goroutines. @@ -665,100 +659,108 @@ unique := lo.UniqBy(files, func(f FileInfo) uint32 { _ = unique ``` -#### 1.2.5 PackageReader.GetInfo Method +#### 1.2.5 Package.GetInfo Method Returns lightweight package information derived from header and computed package-level statistics. -The canonical signature for `GetInfo` is defined in the [PackageReader Interface](#12-packagereader-interface). +```go +// GetInfo returns lightweight package information derived from already-loaded package state. +func (p *Package) GetInfo() (*PackageInfo, error) +``` + +Note: See [Basic Operations API - Package.GetInfo](api_basic_operations.md#17-packagegetinfo-method) for usage context. -##### 1.2.5.1 PackageReader.GetInfo Parameters +##### 1.2.5.1 Package.GetInfo Parameters None (pure in-memory operation). -##### 1.2.5.2 PackageReader.GetInfo Returns +##### 1.2.5.2 Package.GetInfo Returns - `*PackageInfo` - Lightweight package information (see [PackageInfo Structure](api_metadata.md#71-packageinfo-structure)) - `error` - Returns `*PackageError` on failure -##### 1.2.5.3 PackageReader.GetInfo Scope +##### 1.2.5.3 Package.GetInfo Scope Lightweight view over already-loaded package state. This method MUST NOT perform additional disk I/O. This method MUST NOT perform additional parsing beyond what `OpenPackage` already loaded. -##### 1.2.5.4 PackageReader.GetInfo Error Conditions +##### 1.2.5.4 Package.GetInfo Error Conditions -See [Common Error Mapping Table](#128-packagereader-common-error-mapping-table). +See [Common Read Error Mapping Table](#128-common-read-error-mapping-table). -##### 1.2.5.5 PackageReader.GetInfo Concurrency +##### 1.2.5.5 Package.GetInfo Concurrency - Go: Safe for concurrent calls from different goroutines. -#### 1.2.6 PackageReader.GetMetadata Method +#### 1.2.6 Package.GetMetadata Method Returns comprehensive metadata including all package information plus detailed file and metadata file contents. -The canonical signature for `GetMetadata` is defined in the [PackageReader Interface](#12-packagereader-interface). +```go +// GetMetadata returns comprehensive package metadata. +func (p *Package) GetMetadata() (*PackageMetadata, error) +``` -##### 1.2.6.1 PackageReader.GetMetadata Parameters +##### 1.2.6.1 Package.GetMetadata Parameters None (pure in-memory operation). -##### 1.2.6.2 PackageReader.GetMetadata Returns +##### 1.2.6.2 Package.GetMetadata Returns - `*PackageMetadata` - Comprehensive package metadata; see [Package Metadata API - PackageInfo Structure](api_metadata.md#71-packageinfo-structure) - `error` - Returns `*PackageError` on failure -##### 1.2.6.3 PackageReader.GetMetadata Scope +##### 1.2.6.3 Package.GetMetadata Scope Full metadata view over already-loaded package state. This method MUST NOT perform additional disk I/O. This method MUST NOT perform additional parsing beyond what `OpenPackage` already loaded. -##### 1.2.6.4 PackageReader.GetMetadata Serialization +##### 1.2.6.4 Package.GetMetadata Serialization See [Package Metadata API - Package Information Methods](api_metadata.md#74-package-information-methods). -##### 1.2.6.5 PackageReader.GetMetadata Error Conditions +##### 1.2.6.5 Package.GetMetadata Error Conditions `GetMetadata()` MUST still return an error for internal consistency failures (for example, metadata not loaded due to an invariant violation) even though `OpenPackage` is required to eagerly load metadata. `GetMetadata()` requires a fully opened package instance (it is not applicable to header-only inspection). -##### 1.2.6.6 PackageReader.GetMetadata Concurrency +##### 1.2.6.6 Package.GetMetadata Concurrency Safe for concurrent calls from different goroutines. -#### 1.2.7 PackageReader.Validate Method +#### 1.2.7 Package.Validate Method Validates package format, structure, and integrity. -The canonical signature for `Validate` is defined in the [PackageReader Interface](#12-packagereader-interface). +Note: The canonical signature for `Validate` is defined in [Basic Operations API - Package.Validate](api_basic_operations.md#15-packagevalidate-method). -##### 1.2.7.1 PackageReader.Validate Parameters +##### 1.2.7.1 Package.Validate Parameters - `ctx context.Context` - Context for cancellation and timeout handling -##### 1.2.7.2 PackageReader.Validate Returns +##### 1.2.7.2 Package.Validate Returns - `error` - Returns `*PackageError` on failure -##### 1.2.7.3 PackageReader.Validate Behavior +##### 1.2.7.3 Package.Validate Behavior - Performs comprehensive package validation - Can be non-trivial and should be cancellable - Validates format, structure, checksums, and integrity -##### 1.2.7.4 PackageReader.Validate Error Conditions +##### 1.2.7.4 Package.Validate Error Conditions -See [Common Error Mapping Table](#128-packagereader-common-error-mapping-table). +See [Common Read Error Mapping Table](#128-common-read-error-mapping-table). -##### 1.2.7.5 PackageReader.Validate Concurrency +##### 1.2.7.5 Package.Validate Concurrency - Go: Safe for concurrent calls from different goroutines. -#### 1.2.8 PackageReader Common Error Mapping Table +#### 1.2.8 Common Read Error Mapping Table -The following error mapping applies to all `PackageReader` methods: +The following error mapping applies to all package read operations: | Condition | Error Type | Notes | | -------------------------------------- | ------------------- | ----------------------------------------------------------------------------------------------- | @@ -768,22 +770,16 @@ The following error mapping applies to all `PackageReader` methods: | Package integrity check failed | `ErrTypeCorruption` | Integrity failures are corruption. | | Context cancelled or deadline exceeded | `ErrTypeContext` | Applies only to methods that accept `context.Context` (for example, `ReadFile` and `Validate`). | -### 1.3 PackageWriter Interface +### 1.3 Package Write Operations -```go -// PackageWriter defines the interface for writing packages to disk. -type PackageWriter interface { - Write(ctx context.Context) error - SafeWrite(ctx context.Context, overwrite bool) error - FastWrite(ctx context.Context) error -} -``` +Package write operations are part of the `Package` interface. +This section describes their high-level behavior and shared error mapping. #### 1.3.1 Memory Versus Disk Side Effects -The `PackageWriter` interface provides methods for writing the in-memory package state to disk. +Package write operations provide methods for writing the in-memory package state to disk. -File management operations (add, remove) are part of the `Package` interface, not `PackageWriter`. +File management operations (add, remove) are part of the `Package` interface, not a separate writer interface. See [File Management API](api_file_mgmt_index.md) for file operations. ##### 1.3.1.1 Write Operations @@ -812,7 +808,7 @@ For detailed information about allowed target paths, overwrite behavior, path re #### 1.3.2 Common Writer Error Mapping Table -The following error mapping applies to all `PackageWriter` methods: +The following error mapping applies to all package write operations: | Condition | Error Type | Notes | | --------------------------------------------------------------------------------- | ----------------- | --------------------------------------------------------------------------- | @@ -1225,6 +1221,13 @@ const ( ) ``` +#### 10.2.1 ErrorType.String Method + +```go +// String returns a human-readable name for the error type. +func (t ErrorType) String() string +``` + ### 10.3 ErrorType Categories - **ErrTypeValidation**: Input validation errors, invalid parameters, invalid file paths, invalid patterns @@ -1293,6 +1296,13 @@ func (e *PackageError) Is(target error) bool - If no cause exists, returns `false` - This allows `PackageError` to participate in Go's standard error matching patterns +#### 10.4.4 PackageError.WithContext Method + +```go +// WithContext adds a key/value context entry and returns the updated error. +func (e *PackageError) WithContext(key string, value any) *PackageError +``` + ### 10.5 Error Helper Functions Helper functions for creating and managing structured errors. @@ -1455,7 +1465,7 @@ if pkgErr, ok := AsPackageError(err); ok && pkgErr.Type == ErrTypeValidation { // - docs/tech_specs/api_file_mgmt_addition.md#21-addfile // // Example implementation pattern (not canonical): -func exampleReadFileOperation(ctx context.Context, pkg PackageReader, filePath string) ([]byte, error) { +func exampleReadFileOperation(ctx context.Context, pkg Package, filePath string) ([]byte, error) { // Implementation would wrap errors with structured context return nil, nil } diff --git a/docs/tech_specs/api_file_mgmt_extraction.md b/docs/tech_specs/api_file_mgmt_extraction.md index 08c5d873..bef496ba 100644 --- a/docs/tech_specs/api_file_mgmt_extraction.md +++ b/docs/tech_specs/api_file_mgmt_extraction.md @@ -215,7 +215,7 @@ See [Case Sensitivity](api_core.md#221-case-sensitivity) for complete case sensi ExtractPath writes extracted content to the filesystem. To read file content into memory, use `ReadFile`. -See [ReadFile Method Contract](api_core.md#122-packagereaderreadfile-method). +See [ReadFile Method Contract](api_core.md#122-packagereadfile-method). ## 2. ExtractPathOptions Struct diff --git a/docs/tech_specs/api_file_mgmt_file_entry.md b/docs/tech_specs/api_file_mgmt_file_entry.md index 336ca521..d8f2ea26 100644 --- a/docs/tech_specs/api_file_mgmt_file_entry.md +++ b/docs/tech_specs/api_file_mgmt_file_entry.md @@ -918,6 +918,41 @@ func (fe *FileEntry) AssociateWithPathMetadata(pme *PathMetadataEntry) error func (fe *FileEntry) GetPathMetadataForPath(path string) *PathMetadataEntry ``` +### 5.7 FileEntry.GetPaths Method + +```go +// GetPaths returns all paths associated with this FileEntry. +func (fe *FileEntry) GetPaths() []generics.PathEntry +``` + +### 5.8 FileEntry.GetFileID Method + +```go +// GetFileID returns the unique file identifier. +func (fe *FileEntry) GetFileID() uint64 +``` + +### 5.9 FileEntry.GetParentPath Method + +```go +// GetParentPath returns the parent directory path for the primary path. +func (fe *FileEntry) GetParentPath() string +``` + +### 5.10 FileEntry.GetDirectoryDepth Method + +```go +// GetDirectoryDepth returns the depth of the primary path in the hierarchy. +func (fe *FileEntry) GetDirectoryDepth() int +``` + +### 5.11 FileEntry.IsRootRelative Method + +```go +// IsRootRelative returns true if the primary path is root-relative (no parent path). +func (fe *FileEntry) IsRootRelative() bool +``` + ## 6. Marshaling This section describes marshaling operations for FileEntry. @@ -954,7 +989,7 @@ func (fe *FileEntry) MarshalData() ([]byte, error) // Marshal marshals both FileEntry metadata and data. // Returns metadata and data as separate byte slices for flexible writing. // Returns *PackageError on failure. -func (fe *FileEntry) Marshal() (metadata []byte, data []byte, err error) +func (fe *FileEntry) Marshal() (metadata, data []byte, err error) ``` ### 6.2 WriteTo Methods @@ -1007,6 +1042,48 @@ Provides marshaling methods for FileEntry metadata and data, supporting both byt - Writer methods follow the same pattern as `PackageComment.WriteTo` for consistency. - Choose byte-slice methods for simplicity, or writer methods for memory efficiency with large files. +### 6.6 FileEntry Binary Format Methods + +This section describes FileEntry methods tied directly to the binary format contract. + +#### 6.6.1 FileEntry.ReadFrom Method + +```go +// ReadFrom reads FileEntry metadata from a reader. +// Implements io.ReaderFrom. +// Returns *PackageError on failure. +func (fe *FileEntry) ReadFrom(r io.Reader) (int64, error) +``` + +#### 6.6.2 FileEntry.Validate Method + +```go +// Validate validates the FileEntry state. +// Returns *PackageError on failure. +func (fe *FileEntry) Validate() error +``` + +#### 6.6.3 FileEntry.FixedSize Method + +```go +// FixedSize returns the size of the fixed FileEntry metadata section in bytes. +func (fe *FileEntry) FixedSize() int +``` + +#### 6.6.4 FileEntry.VariableSize Method + +```go +// VariableSize returns the size of the variable-length FileEntry metadata section in bytes. +func (fe *FileEntry) VariableSize() int +``` + +#### 6.6.5 FileEntry.TotalSize Method + +```go +// TotalSize returns the total size of the FileEntry metadata (fixed + variable) in bytes. +func (fe *FileEntry) TotalSize() int +``` + ## 7. FileEntry Properties This section describes properties and accessors for FileEntry. diff --git a/docs/tech_specs/api_file_mgmt_queries.md b/docs/tech_specs/api_file_mgmt_queries.md index a626363c..a06f8aaa 100644 --- a/docs/tech_specs/api_file_mgmt_queries.md +++ b/docs/tech_specs/api_file_mgmt_queries.md @@ -108,8 +108,8 @@ func (p *Package) FileExists(path string) (bool, error) #### 1.1.2 Package.ListFiles Method ```go -// ListFiles returns all file entries in the package -func (p *Package) ListFiles() ([]*FileEntry, error) +// ListFiles returns lightweight file info for all files in the package +func (p *Package) ListFiles() ([]FileInfo, error) ``` ### 1.2 Purpose @@ -118,9 +118,12 @@ Defines basic file existence checks and listing operations. ### 1.3 FileEntry Access -Query functions return `*FileEntry` objects or `[]*FileEntry` arrays. +Some query functions return full `*FileEntry` objects (or `[]*FileEntry` arrays). These objects provide comprehensive file information including metadata, compression status, encryption details, checksums, and timestamps. +`ListFiles()` returns `[]FileInfo` for lightweight listing. +Use `GetFileByPath()` or other single-entry lookups when full `*FileEntry` details are required. + ### 1.4 Usage Notes To list directories in the package, use `ListDirectories()` from the [Metadata API](api_metadata.md). diff --git a/docs/tech_specs/api_generics.md b/docs/tech_specs/api_generics.md index e51fcea0..2d336b9b 100644 --- a/docs/tech_specs/api_generics.md +++ b/docs/tech_specs/api_generics.md @@ -699,7 +699,14 @@ func (b *ConfigBuilder[T]) WithCompressionLevel(level int) *ConfigBuilder[T] func (b *ConfigBuilder[T]) WithStrategy(strategy Strategy[T, T]) *ConfigBuilder[T] ``` -##### 1.10.2.7 ConfigBuilder[T].Build Method +##### 1.10.2.7 ConfigBuilder[T].WithValidator Method + +```go +// WithValidator sets the validator for the configuration. +func (b *ConfigBuilder[T]) WithValidator(validator Validator[T]) *ConfigBuilder[T] +``` + +##### 1.10.2.8 ConfigBuilder[T].Build Method ```go // Build constructs and returns the final configuration. diff --git a/docs/tech_specs/api_go_defs_index.md b/docs/tech_specs/api_go_defs_index.md index 7fc3e706..7d5b3a48 100644 --- a/docs/tech_specs/api_go_defs_index.md +++ b/docs/tech_specs/api_go_defs_index.md @@ -5,60 +5,54 @@ - [1.1 Package Lifecycle Methods](#11-package-lifecycle-methods) - [1.2 Package File Management Methods](#12-package-file-management-methods) - [1.3 Package Information and Queries Methods](#13-package-information-and-queries-methods) - - [1.4 Package Metadata Methods](#14-package-metadata-methods) - - [1.5 Package Compression Methods](#15-package-compression-methods) - - [1.6 Package Path and Configuration Methods](#16-package-path-and-configuration-methods) - - [1.7 Package Signature Management Methods](#17-package-signature-management-methods) - - [1.8 Package Other Methods](#18-package-other-methods) - - [1.9 Package Helper Functions](#19-package-helper-functions) -- [2. PackageReader Interface Types](#2-packagereader-interface-types) - - [2.1 PackageReader Methods](#21-packagereader-methods) - - [2.1.1 PackageReader Read Operations](#211-packagereader-read-operations) - - [2.1.2 PackageReader Query Operations](#212-packagereader-query-operations) - - [2.1.3 PackageReader Other Methods](#213-packagereader-other-methods) - - [2.2 PackageReader Helper Functions](#22-packagereader-helper-functions) -- [3. PackageWriter Interface Types](#3-packagewriter-interface-types) - - [3.1 PackageWriter Methods](#31-packagewriter-methods) - - [3.1.1 PackageWriter Write Operations](#311-packagewriter-write-operations) - - [3.1.2 PackageWriter Other Methods](#312-packagewriter-other-methods) - - [3.2 PackageWriter Helper Functions](#32-packagewriter-helper-functions) -- [4. FileEntry Types](#4-fileentry-types) - - [4.1 FileEntry Methods](#41-fileentry-methods) - - [4.1.1 FileEntry Data Management Methods](#411-fileentry-data-management-methods) - - [4.1.2 FileEntry Transformation Methods](#412-fileentry-transformation-methods) - - [4.2 FileEntry Helper Functions](#42-fileentry-helper-functions) -- [5. Metadata Types](#5-metadata-types) - - [5.1 Metadata Methods](#51-metadata-methods) - - [5.2 Metadata Helper Functions](#52-metadata-helper-functions) -- [6. Compression Types](#6-compression-types) - - [6.1 Compression Methods](#61-compression-methods) - - [6.2 Compression Helper Functions](#62-compression-helper-functions) -- [7. Encryption and Security Types](#7-encryption-and-security-types) - - [7.1 Encryption and Security Methods](#71-encryption-and-security-methods) - - [7.2 Encryption and Security Helper Functions](#72-encryption-and-security-helper-functions) -- [8. Signature Types](#8-signature-types) - - [8.1 Signature Methods](#81-signature-methods) - - [8.2 Signature Helper Functions](#82-signature-helper-functions) -- [9. Streaming and Buffer Types](#9-streaming-and-buffer-types) - - [9.1 Streaming and Buffer Methods](#91-streaming-and-buffer-methods) - - [9.2 Streaming and Buffer Helper Functions](#92-streaming-and-buffer-helper-functions) -- [10. Deduplication Types](#10-deduplication-types) - - [10.1 Deduplication Methods](#101-deduplication-methods) - - [10.2 Deduplication Helper Functions](#102-deduplication-helper-functions) -- [11. FileType System Types](#11-filetype-system-types) - - [11.1 FileType System Methods](#111-filetype-system-methods) - - [11.2 FileType System Helper Functions](#112-filetype-system-helper-functions) -- [12. Generic Types](#12-generic-types) - - [12.1 Generic Methods](#121-generic-methods) - - [12.2 Generic Helper Functions](#122-generic-helper-functions) -- [13. Error Types](#13-error-types) - - [13.1 Error Methods](#131-error-methods) - - [13.2 Error Helper Functions](#132-error-helper-functions) -- [14. Other Types](#14-other-types) - - [14.1 Other Type Methods](#141-other-type-methods) -- [15. General Helper Functions](#15-general-helper-functions) - - [15.1 General Validation Functions](#151-general-validation-functions) - - [15.2 General Utility Functions](#152-general-utility-functions) + - [1.4 Package Comment Methods](#14-package-comment-methods) + - [1.5 Package Identity Methods](#15-package-identity-methods) + - [1.6 Package Special File Methods](#16-package-special-file-methods) + - [1.7 Package Path Metadata Methods](#17-package-path-metadata-methods) + - [1.8 Package Symlink Methods](#18-package-symlink-methods) + - [1.9 Package Metadata-Only Methods](#19-package-metadata-only-methods) + - [1.10 Package Info Methods](#110-package-info-methods) + - [1.11 Package Metadata Validation Methods](#111-package-metadata-validation-methods) + - [1.12 Package Metadata Internal Methods](#112-package-metadata-internal-methods) + - [1.13 Package Compression Methods](#113-package-compression-methods) + - [1.14 Package Path and Configuration Methods](#114-package-path-and-configuration-methods) + - [1.15 Package File Encryption Methods](#115-package-file-encryption-methods) + - [1.16 Package Signature Management Methods](#116-package-signature-management-methods) + - [1.17 Package Write Methods](#117-package-write-methods) + - [1.18 Package Other Methods](#118-package-other-methods) + - [1.19 Package Helper Functions](#119-package-helper-functions) +- [2. FileEntry Types](#2-fileentry-types) + - [2.1 FileEntry Query Methods](#21-fileentry-query-methods) + - [2.2 FileEntry Data Methods](#22-fileentry-data-methods) + - [2.3 FileEntry Temp File Methods](#23-fileentry-temp-file-methods) + - [2.4 FileEntry Serialization Methods](#24-fileentry-serialization-methods) + - [2.5 FileEntry Path Methods](#25-fileentry-path-methods) + - [2.6 FileEntry Transformation Methods](#26-fileentry-transformation-methods) + - [2.7 FileEntry Helper Functions](#27-fileentry-helper-functions) + - [2.8 Tag Methods](#28-tag-methods) +- [3. Package Metadata Types](#3-package-metadata-types) + - [3.1 Package Metadata Type Methods](#31-package-metadata-type-methods) + - [3.2 Package Metadata Helper Functions](#32-package-metadata-helper-functions) +- [4. Compression Types](#4-compression-types) + - [4.1 Compression Methods](#41-compression-methods) + - [4.2 Compression Helper Functions](#42-compression-helper-functions) +- [5. Encryption and Security Types](#5-encryption-and-security-types) + - [5.1 Encryption and Security Methods](#51-encryption-and-security-methods) + - [5.2 Encryption and Security Helper Functions](#52-encryption-and-security-helper-functions) +- [6. Signature Types](#6-signature-types) + - [6.1 Signature Methods](#61-signature-methods) + - [6.2 Signature Helper Functions](#62-signature-helper-functions) +- [7. Streaming and Buffer Types](#7-streaming-and-buffer-types) + - [7.1 Streaming and Buffer Methods](#71-streaming-and-buffer-methods) + - [7.2 Streaming and Buffer Helper Functions](#72-streaming-and-buffer-helper-functions) +- [8. FileType System Types](#8-filetype-system-types) + - [8.1 FileType System Helper Functions](#81-filetype-system-helper-functions) +- [9. Generic Types](#9-generic-types) + - [9.1 Generic Methods](#91-generic-methods) + - [9.2 Generic Helper Functions](#92-generic-helper-functions) +- [10. Error Types](#10-error-types) + - [10.1 Error Methods](#101-error-methods) + - [10.2 Error Helper Functions](#102-error-helper-functions) ## 0. Overview @@ -67,24 +61,28 @@ Use this index to quickly locate specific API elements across the documentation. ## 1. Package Interface Types -- **`filePackage`** - [filePackage Struct](api_core.md#111-filepackage-struct) - - filePackage is the concrete implementation of the Package interface. - - filePackage is documented in the linked spec. - **`Package`** - [Package](api_core.md#11-package-interface) - Package defines the main interface for NovusPack package operations. - - Package combines PackageReader and PackageWriter interfaces, providing complete package lifecycle management including opening, closing, and defragmentation operations. - - Package is documented in the linked spec. + - Package provides a unified v1 API surface for package read and write operations, including complete lifecycle management. + - Package is documented in the linked spec. +- **`RecoveryFileHeader`** - [RecoveryFileHeader](api_writing.md#2721-recoveryfileheader-structure) + - RecoveryFileHeader contains header information for recovery files used by writing operations. +- **`filePackage`** - [filePackage Struct](api_core.md#111-filepackage-struct) + - filePackage is the concrete implementation of the Package interface. + - filePackage is documented in the linked spec. +- **`readOnlyPackage`** - [11.3 readOnlyPackage Struct](api_basic_operations.md#113-readonlypackage-struct) + - readOnlyPackage is a wrapper that enforces read-only behavior for a Package. ### 1.1 Package Lifecycle Methods - **`Package.Close`** - [Package.Close](api_basic_operations.md#13-packageclose-method) - Close closes the package and releases resources Returns *PackageError on failure. +- **`Package.Defragment`** - [Package.Defragment](api_basic_operations.md#16-packagedefragment-method) + - Defragment optimizes the package layout and removes unused space. - **`Package.Validate`** - [Package.Validate](api_basic_operations.md#15-packagevalidate-method) - Validate validates package format, structure, and integrity. - **`Package.ValidateIntegrity`** - [Package.ValidateIntegrity](api_security.md#111-packagevalidateintegrity-method) - ValidateIntegrity validates package integrity (checksums and structural consistency). -- **`Package.Defragment`** - [Package.Defragment](api_basic_operations.md#16-packagedefragment-method) - - Defragment optimizes the package layout and removes unused space. ### 1.2 Package File Management Methods @@ -94,10 +92,16 @@ Use this index to quickly locate specific API elements across the documentation. - AddFile adds a file to the package. - **`Package.AddFileFromMemory`** - [Package.AddFileFromMemory](api_file_mgmt_addition.md#22-packageaddfilefrommemory-method) - AddFileFromMemory adds a file to the package from in-memory data. -- **`Package.AddFileWithEncryption`** - [Package.AddFileWithEncryption](api_file_mgmt_addition.md#23-packageaddfilewithencryption-method) - - AddFileWithEncryption adds a file to the package and configures encryption for the entry. +- **`Package.AddFileHash`** - [Package.AddFileHash](api_file_mgmt_updates.md#16-packageaddfilehash-method) + - AddFileHash adds a hash entry to a FileEntry for integrity or deduplication. +- **`Package.AddFilePath`** - [Package.AddFilePath](api_file_mgmt_updates.md#14-packageaddfilepath-method) + - AddFilePath adds an additional stored path to an existing FileEntry. - **`Package.AddFilePattern`** - [Package.AddFilePattern](api_file_mgmt_addition.md#24-packageaddfilepattern-method) - AddFilePattern adds files matching a filesystem pattern into the package. +- **`Package.AddFileWithEncryption`** - [Package.AddFileWithEncryption](api_file_mgmt_addition.md#23-packageaddfilewithencryption-method) + - AddFileWithEncryption adds a file to the package and configures encryption for the entry. +- **`Package.AddPathToExistingEntry`** - [Package.AddPathToExistingEntry](api_deduplication.md#313-packageaddpathtoexistingentry-method) + - AddPathToExistingEntry adds an additional path to an existing entry as part of deduplication. - **`Package.ExtractPath`** - [Package.ExtractPath](api_file_mgmt_extraction.md#12-packageextractpath-method) - ExtractPath extracts a file or directory subtree from the package to disk. - **`Package.RemoveDirectory`** - [Package.RemoveDirectory](api_file_mgmt_removal.md#42-packageremovedirectory-method) @@ -108,22 +112,17 @@ Use this index to quickly locate specific API elements across the documentation. - RemoveFile removes a file from the package. - High-level counterpart to AddFile. - Returns *PackageError on failure. +- **`Package.RemoveFilePath`** - [Package.RemoveFilePath](api_file_mgmt_updates.md#15-packageremovefilepath-method) + - RemoveFilePath removes a stored path from an existing FileEntry. - **`Package.RemoveFilePattern`** - [Package.RemoveFilePattern](api_file_mgmt_removal.md#32-packageremovefilepattern-method) - RemoveFilePattern removes files matching a pattern from the package. - High-level counterpart to AddFilePattern. - Returns *PackageError on failure. +- **`Package.UpdateFile`** - [Package.UpdateFile](api_file_mgmt_updates.md#11-packageupdatefile-method) + - UpdateFile updates file content and metadata in the package The new file data is read from the sourceFilePath on the filesystem. + - The storedPath identifies which file in the package to update. - **`Package.UpdateFilePattern`** - [Package.UpdateFilePattern](api_file_mgmt_updates.md#12-packageupdatefilepattern-method) - UpdateFilePattern updates files matching a pattern using a source directory and options. -- **`Package.AddFilePath`** - [Package.AddFilePath](api_file_mgmt_updates.md#14-packageaddfilepath-method) - - AddFilePath adds an additional stored path to an existing FileEntry. -- **`Package.RemoveFilePath`** - [Package.RemoveFilePath](api_file_mgmt_updates.md#15-packageremovefilepath-method) - - RemoveFilePath removes a stored path from an existing FileEntry. -- **`Package.AddFileHash`** - [Package.AddFileHash](api_file_mgmt_updates.md#16-packageaddfilehash-method) - - AddFileHash adds a hash entry to a FileEntry for integrity or deduplication. -- **`Package.SetSessionBase`** - [Package.SetSessionBase](api_basic_operations.md#194-packagesetsessionbase-method) - - SetSessionBase explicitly sets the package-level session base path This method allows setting the session base before any file operations Returns *PackageError on failure (e.g., invalid path format). -- **`Package.ClearSessionBase`** - [Package.ClearSessionBase](api_basic_operations.md#196-packageclearsessionbase-method) - - ClearSessionBase clears the current session base path. ### 1.3 Package Information and Queries Methods @@ -131,8 +130,14 @@ Use this index to quickly locate specific API elements across the documentation. - FileExists checks if a file with the given path exists in the package. - **`Package.FindEntriesByPathPatterns`** - [Package.FindEntriesByPathPatterns](api_file_mgmt_queries.md#331-packagefindentriesbypathpatterns-method) - FindEntriesByPathPatterns gets files matching patterns from the package. +- **`Package.FindEntriesByTag`** - [Package.FindEntriesByTag](api_file_mgmt_queries.md#311-packagefindentriesbytag-method) + - FindEntriesByTag finds all FileEntry objects with a specific tag. - **`Package.FindEntriesByType`** - [Package.FindEntriesByType](api_file_mgmt_queries.md#321-packagefindentriesbytype-method) - FindEntriesByType finds all FileEntry objects of a specific type. +- **`Package.FindExistingEntryByCRC32`** - [Package.FindExistingEntryByCRC32](api_deduplication.md#311-packagefindexistingentrybycrc32-method) + - FindExistingEntryByCRC32 finds an existing entry by size and CRC32 (deduplication helper). +- **`Package.FindExistingEntryMultiLayer`** - [Package.FindExistingEntryMultiLayer](api_deduplication.md#312-packagefindexistingentrymultilayer-method) + - FindExistingEntryMultiLayer finds an existing entry using multi-layer deduplication checks. - **`Package.GetFileByChecksum`** - [Package.GetFileByChecksum](api_file_mgmt_queries.md#251-packagegetfilebychecksum-method) - GetFileByChecksum gets a FileEntry by CRC32 checksum Returns *PackageError if file not found. - **`Package.GetFileByFileID`** - [Package.GetFileByFileID](api_file_mgmt_queries.md#231-packagegetfilebyfileid-method) @@ -145,188 +150,148 @@ Use this index to quickly locate specific API elements across the documentation. - GetFileByPath gets a FileEntry by path Returns *PackageError if file not found. - **`Package.GetFileCount`** - [Package.GetFileCount](api_file_mgmt_queries.md#411-packagegetfilecount-method) - GetFileCount returns the total number of regular content files in the package Excludes special metadata files (types 65000-65535). +- **`Package.GetInfo`** - [Package.GetInfo](api_core.md#125-packagegetinfo-method) + - GetInfo returns lightweight package information derived from already-loaded package state. +- **`Package.GetMetadata`** - [Package.GetMetadata](api_core.md#126-packagegetmetadata-method) + - GetMetadata returns comprehensive package metadata. +- **`Package.GetMultiPathCount`** - [Package.GetMultiPathCount](api_file_mgmt_updates.md#1732-packagegetmultipathcount-method) + - GetMultiPathCount returns the number of entries with multiple stored paths. +- **`Package.GetMultiPathEntries`** - [Package.GetMultiPathEntries](api_file_mgmt_updates.md#1731-packagegetmultipathentries-method) + - GetMultiPathEntries returns FileEntry objects with multiple stored paths. - **`Package.GetPath`** - [Package.GetPath](api_basic_operations.md#188-packagegetpath-method) - GetPath returns the current package file path. -- **`Package.GetSessionBase`** - [Package.GetSessionBase](api_basic_operations.md#195-packagegetsessionbase-method) - - GetSessionBase returns the current session base path Returns empty string if no session base has been established. +- **`Package.GetSecurityStatus`** - [Package.GetSecurityStatus](api_security.md#112-packagegetsecuritystatus-method) + - GetSecurityStatus returns the current security status of the package. - **`Package.IsOpen`** - [Package.IsOpen](api_basic_operations.md#186-packageisopen-method) - IsOpen checks if the package is currently open. - **`Package.IsReadOnly`** - [Package.IsReadOnly](api_basic_operations.md#187-packageisreadonly-method) - IsReadOnly checks if the package is in read-only mode. -- **`Package.ListFiles`** - [Package.ListFiles](api_file_mgmt_queries.md#112-packagelistfiles-method) - - ListFiles returns all file entries in the package. -- **`Package.GetSecurityStatus`** - [Package.GetSecurityStatus](api_security.md#112-packagegetsecuritystatus-method) - - GetSecurityStatus returns the current security status of the package. - **`Package.ListEncryptedFiles`** - [Package.ListEncryptedFiles](api_file_mgmt_queries.md#431-packagelistencryptedfiles-method) - ListEncryptedFiles returns encrypted file entries in the package. -- **`Package.FindExistingEntryByCRC32`** - [Package.FindExistingEntryByCRC32](api_deduplication.md#311-packagefindexistingentrybycrc32-method) - - FindExistingEntryByCRC32 finds an existing entry by size and CRC32 (deduplication helper). -- **`Package.FindExistingEntryMultiLayer`** - [Package.FindExistingEntryMultiLayer](api_deduplication.md#312-packagefindexistingentrymultilayer-method) - - FindExistingEntryMultiLayer finds an existing entry using multi-layer deduplication checks. -- **`Package.AddPathToExistingEntry`** - [Package.AddPathToExistingEntry](api_deduplication.md#313-packageaddpathtoexistingentry-method) - - AddPathToExistingEntry adds an additional path to an existing entry as part of deduplication. -- **`Package.SetTargetPath`** - [Package.SetTargetPath](api_basic_operations.md#8-packagesettargetpath-method) - - SetTargetPath changes the package's target write path Returns *PackageError on failure. -- **`Package.updateFilePathAssociations`** - [Package.updateFilePathAssociations](api_basic_operations.md#323-packageupdatefilepathassociations-method) - - updateFilePathAssociations links files to their path metadata Returns *PackageError on failure. +- **`Package.ListFiles`** - [Package.ListFiles](api_file_mgmt_queries.md#112-packagelistfiles-method) + - ListFiles returns lightweight file info for all files in the package. -### 1.4 Package Metadata Methods +### 1.4 Package Comment Methods -- **`Package.AddDirectoryMetadata`** - [Package.AddDirectoryMetadata](api_metadata.md#8219-packageadddirectorymetadata-method) - - AddDirectoryMetadata adds directory path metadata (metadata-only, does not add files) Returns *PackageError on failure. -- **`Package.AddIndexFile`** - [Package.AddIndexFile](api_metadata.md#531-packageaddindexfile-method) - - AddIndexFile adds a package index file Returns *PackageError on failure. -- **`Package.AddManifestFile`** - [Package.AddManifestFile](api_metadata.md#521-packageaddmanifestfile-method) - - AddManifestFile adds a package manifest file Returns *PackageError on failure. -- **`Package.AddMetadataFile`** - [Package.AddMetadataFile](api_metadata.md#511-packageaddmetadatafile-method) - - AddMetadataFile adds a YAML metadata file to the package Returns *PackageError on failure. -- **`Package.AddMetadataOnlyFile`** - [Package.AddMetadataOnlyFile](api_metadata.md#642-packageaddmetadataonlyfile-method) - - AddMetadataOnlyFile adds a special metadata file to a metadata-only package Returns *PackageError on failure. -- **`Package.AddPathMetadata`** - [Package.AddPathMetadata](api_metadata.md#8213-packageaddpathmetadata-method) - - AddPathMetadata adds a new path metadata entry to the package Returns *PackageError on failure. -- **`Package.AddSignatureFile`** - [Package.AddSignatureFile](api_metadata.md#541-packageaddsignaturefile-method) - - AddSignatureFile adds a digital signature file Returns *PackageError on failure. -- **`Package.AddSymlink`** - [Package.AddSymlink](api_metadata.md#8531-packageaddsymlink-method) - - AddSymlink adds a symbolic link to the package. - - Parameter: symlink: SymlinkEntry to add. - - Return: Error if validation fails or symlink cannot be added. - - Validation: Calls ValidateSymlinkPaths() to ensure paths are valid and within package root. - - Validation: Verifies target exists as FileEntry or PathMetadataEntry directory. - - Validation: Returns ErrTypeValidation, ErrTypeSecurity, or ErrTypeNotFound on validation failure Returns *PackageError on failure. -- **`Package.ClearAppID`** - [Package.ClearAppID](api_metadata.md#212-packageclearappid-method) - - ClearAppID removes the package AppID (set to 0) Returns *PackageError on failure. - **`Package.ClearComment`** - [Package.ClearComment](api_metadata.md#113-packageclearcomment-method) - ClearComment removes the package comment Returns *PackageError on failure. +- **`Package.GetComment`** - [Package.GetComment](api_metadata.md#112-packagegetcomment-method) + - GetComment retrieves the current package comment. +- **`Package.HasComment`** - [Package.HasComment](api_metadata.md#114-packagehascomment-method) + - HasComment checks if the package has a comment. +- **`Package.SetComment`** - [Package.SetComment](api_metadata.md#111-packagesetcomment-method) + - SetComment sets or updates the package comment Returns *PackageError on failure. + +### 1.5 Package Identity Methods + +- **`Package.ClearAppID`** - [Package.ClearAppID](api_metadata.md#212-packageclearappid-method) + - ClearAppID removes the package AppID (set to 0) Returns *PackageError on failure. - **`Package.ClearPackageIdentity`** - [Package.ClearPackageIdentity](api_metadata.md#412-packageclearpackageidentity-method) - ClearPackageIdentity clears both VendorID and AppID Returns *PackageError on failure. - **`Package.ClearVendorID`** - [Package.ClearVendorID](api_metadata.md#312-packageclearvendorid-method) - ClearVendorID removes the package VendorID (set to 0) Returns *PackageError on failure. -- **`Package.ConvertPathsToSymlinks`** - [Package.ConvertPathsToSymlinks](api_file_mgmt_updates.md#1711-packageconvertpathstosymlinks-method) - - ConvertPathsToSymlinks converts duplicate paths on a FileEntry to symlinks. - - Parameter: ctx: Context for cancellation and timeout. - - Parameter: entry: FileEntry with multiple paths (PathCount > 1). - - Parameter: options: Path-to-symlink conversion options (primary path selection, metadata preservation). - - Return: Updated FileEntry with single path. - - Return: Slice of created SymlinkEntry objects. -- **`Package.ConvertSymlinksToHardLinks`** - [Package.ConvertSymlinksToHardLinks](api_file_mgmt_updates.md#1721-packageconvertsymlinkstohardlinks-method) - - ConvertSymlinksToHardLinks converts symlinks back to hard links (reverse operation). - - Parameter: ctx: Context for cancellation and timeout. - - Parameter: symlinkEntry: SymlinkEntry to convert back to hard link. - - Return: Updated FileEntry with additional path added. - - Return: Error if conversion fails. - - Behavior: Removes SymlinkEntry from package. -- **`Package.ConvertAllPathsToSymlinks`** - [Package.ConvertAllPathsToSymlinks](api_file_mgmt_updates.md#1712-packageconvertallpathstosymlinks-method) - - ConvertAllPathsToSymlinks converts all eligible multi-path entries to use symlinks. -- **`Package.ConvertAllSymlinksToHardLinks`** - [Package.ConvertAllSymlinksToHardLinks](api_file_mgmt_updates.md#1722-packageconvertallsymlinkstohardlinks-method) - - ConvertAllSymlinksToHardLinks converts all symlinks back to hard links. -- **`Package.GetMultiPathEntries`** - [Package.GetMultiPathEntries](api_file_mgmt_updates.md#1731-packagegetmultipathentries-method) - - GetMultiPathEntries returns FileEntry objects with multiple stored paths. -- **`Package.GetMultiPathCount`** - [Package.GetMultiPathCount](api_file_mgmt_updates.md#1732-packagegetmultipathcount-method) - - GetMultiPathCount returns the number of entries with multiple stored paths. -- **`Package.CreateSpecialMetadataFile`** - [Package.CreateSpecialMetadataFile](api_metadata.md#82114-packagecreatespecialmetadatafile-method) - - CreateSpecialMetadataFile creates a special metadata FileEntry Returns *PackageError on failure. -- **`Package.FindEntriesByTag`** - [Package.FindEntriesByTag](api_file_mgmt_queries.md#311-packagefindentriesbytag-method) - - FindEntriesByTag finds all FileEntry objects with a specific tag. - **`Package.GetAppID`** - [Package.GetAppID](api_metadata.md#211-packagegetappid-method) - GetAppID retrieves the current package AppID. - **`Package.GetAppIDInfo`** - [Package.GetAppIDInfo](api_metadata.md#214-packagegetappidinfo-method) - GetAppIDInfo gets detailed AppID information if available. -- **`Package.GetComment`** - [Package.GetComment](api_metadata.md#112-packagegetcomment-method) - - GetComment retrieves the current package comment. -- **`Package.GetFilePathAssociation`** - [Package.GetFilePathAssociation](api_metadata.md#8511-packagegetfilepathassociation-method) - - Package.GetFilePathAssociation File-path association query methods (Package-level) These methods work with path strings to find and return associated structs. -- **`Package.GetFilesInPath`** - [Package.GetFilesInPath](api_metadata.md#8512-packagegetfilesinpath-method) - - GetFilesInPath returns all file entries within the specified path. +- **`Package.GetPackageIdentity`** - [Package.GetPackageIdentity](api_metadata.md#411-packagegetpackageidentity-method) + - GetPackageIdentity gets both VendorID and AppID. +- **`Package.GetVendorID`** - [Package.GetVendorID](api_metadata.md#311-packagegetvendorid-method) + - GetVendorID retrieves the current package VendorID. +- **`Package.GetVendorIDInfo`** - [Package.GetVendorIDInfo](api_metadata.md#314-packagegetvendoridinfo-method) + - GetVendorIDInfo gets detailed VendorID information if available. +- **`Package.HasAppID`** - [Package.HasAppID](api_metadata.md#213-packagehasappid-method) + - HasAppID checks if the package has an AppID (non-zero). +- **`Package.HasVendorID`** - [Package.HasVendorID](api_metadata.md#313-packagehasvendorid-method) + - HasVendorID checks if the package has a VendorID (non-zero). +- **`Package.SetAppID`** - [Package.SetAppID](api_metadata.md#21-packagesetappid-method) + - SetAppID sets or updates the package AppID Returns *PackageError on failure. +- **`Package.SetPackageIdentity`** - [Package.SetPackageIdentity](api_metadata.md#41-packagesetpackageidentity-method) + - SetPackageIdentity sets both VendorID and AppID Returns *PackageError on failure. +- **`Package.SetVendorID`** - [Package.SetVendorID](api_metadata.md#31-packagesetvendorid-method) + - SetVendorID sets or updates the package VendorID Returns *PackageError on failure. + +### 1.6 Package Special File Methods + +- **`Package.AddIndexFile`** - [Package.AddIndexFile](api_metadata.md#531-packageaddindexfile-method) + - AddIndexFile adds a package index file Returns *PackageError on failure. +- **`Package.AddManifestFile`** - [Package.AddManifestFile](api_metadata.md#521-packageaddmanifestfile-method) + - AddManifestFile adds a package manifest file Returns *PackageError on failure. +- **`Package.AddMetadataFile`** - [Package.AddMetadataFile](api_metadata.md#511-packageaddmetadatafile-method) + - AddMetadataFile adds a YAML metadata file to the package Returns *PackageError on failure. +- **`Package.AddSignatureFile`** - [Package.AddSignatureFile](api_metadata.md#541-packageaddsignaturefile-method) + - AddSignatureFile adds a digital signature file Returns *PackageError on failure. +- **`Package.CreateSpecialMetadataFile`** - [Package.CreateSpecialMetadataFile](api_metadata.md#82114-packagecreatespecialmetadatafile-method) + - CreateSpecialMetadataFile creates a special metadata FileEntry Returns *PackageError on failure. - **`Package.GetIndexFile`** - [Package.GetIndexFile](api_metadata.md#532-packagegetindexfile-method) - GetIndexFile retrieves the package index Returns *PackageError on failure. - **`Package.GetManifestFile`** - [Package.GetManifestFile](api_metadata.md#522-packagegetmanifestfile-method) - GetManifestFile retrieves the package manifest Returns *PackageError on failure. - **`Package.GetMetadataFile`** - [Package.GetMetadataFile](api_metadata.md#512-packagegetmetadatafile-method) - GetMetadataFile retrieves metadata from the special metadata file Returns *PackageError on failure. -- **`Package.GetMetadataIndexOffset`** - [Package.GetMetadataIndexOffset](api_package_compression.md#7292-packagegetmetadataindexoffset-method) - - GetMetadataIndexOffset returns the offset to metadata index Returns fixed offset 112 bytes (PackageHeaderSize) when compression enabled Returns *PackageError if package is not compressed (no metadata index). -- **`Package.GetMetadataOnlyFiles`** - [Package.GetMetadataOnlyFiles](api_metadata.md#643-packagegetmetadataonlyfiles-method) - - GetMetadataOnlyFiles returns all metadata files in the package Returns *PackageError on failure. -- **`Package.GetPackageIdentity`** - [Package.GetPackageIdentity](api_metadata.md#411-packagegetpackageidentity-method) - - GetPackageIdentity gets both VendorID and AppID. -- **`Package.GetPackageInfo`** - [Package.GetPackageInfo](api_metadata.md#741-packagegetpackageinfo-method) - - GetPackageInfo returns comprehensive package information. -- **`Package.GetPathConflicts`** - [Package.GetPathConflicts](api_metadata.md#82113-packagegetpathconflicts-method) - - GetPathConflicts returns a list of paths with conflicting metadata Returns *PackageError on failure. -- **`Package.GetPathFiles`** - [Package.GetPathFiles](api_metadata.md#8513-packagegetpathfiles-method) - - GetPathFiles returns all file entries associated with the specified path. -- **`Package.GetPathMetadata`** - [Package.GetPathMetadata](api_metadata.md#8211-packagegetpathmetadata-method) - - GetPathMetadata retrieves all path metadata entries from the package Returns *PackageError on failure. -- **`Package.GetPathStats`** - [Package.GetPathStats](api_metadata.md#8522-packagegetpathstats-method) - - GetPathStats returns statistics for all paths in the package. -- **`Package.GetPathTree`** - [Package.GetPathTree](api_metadata.md#8521-packagegetpathtree-method) - - Package.GetPathTree Path hierarchy analysis. - **`Package.GetSignatureFile`** - [Package.GetSignatureFile](api_metadata.md#542-packagegetsignaturefile-method) - GetSignatureFile retrieves the signature file Returns *PackageError on failure. - **`Package.GetSpecialFileByType`** - [Package.GetSpecialFileByType](api_metadata.md#552-packagegetspecialfilebytype-method) - GetSpecialFileByType retrieves special file by type. - **`Package.GetSpecialFiles`** - [Package.GetSpecialFiles](api_metadata.md#551-packagegetspecialfiles-method) - GetSpecialFiles returns all special files in the package. -- **`Package.GetSymlink`** - [Package.GetSymlink](api_metadata.md#8533-packagegetsymlink-method) - - Package.GetSymlink Returns *PackageError on failure. -- **`Package.GetVendorID`** - [Package.GetVendorID](api_metadata.md#311-packagegetvendorid-method) - - GetVendorID retrieves the current package VendorID. -- **`Package.GetVendorIDInfo`** - [Package.GetVendorIDInfo](api_metadata.md#314-packagegetvendoridinfo-method) - - GetVendorIDInfo gets detailed VendorID information if available. -- **`Package.HasAppID`** - [Package.HasAppID](api_metadata.md#213-packagehasappid-method) - - HasAppID checks if the package has an AppID (non-zero). -- **`Package.HasComment`** - [Package.HasComment](api_metadata.md#114-packagehascomment-method) - - HasComment checks if the package has a comment. - **`Package.HasIndexFile`** - [Package.HasIndexFile](api_metadata.md#535-packagehasindexfile-method) - HasIndexFile checks if package has an index file. - **`Package.HasManifestFile`** - [Package.HasManifestFile](api_metadata.md#525-packagehasmanifestfile-method) - HasManifestFile checks if package has a manifest file. - **`Package.HasMetadataFile`** - [Package.HasMetadataFile](api_metadata.md#515-packagehasmetadatafile-method) - HasMetadataFile checks if package has a metadata file. -- **`Package.HasMetadataIndex`** - [Package.HasMetadataIndex](api_package_compression.md#7291-packagehasmetadataindex-method) - - HasMetadataIndex checks if package has metadata index (compression enabled) Returns true if header flags bits 15-8 != 0. - **`Package.HasSignatureFile`** - [Package.HasSignatureFile](api_metadata.md#545-packagehassignaturefile-method) - HasSignatureFile checks if package has a signature file. -- **`Package.HasVendorID`** - [Package.HasVendorID](api_metadata.md#313-packagehasvendorid-method) - - HasVendorID checks if the package has a VendorID (non-zero). -- **`Package.IsMetadataOnlyPackage`** - [Package.IsMetadataOnlyPackage](api_metadata.md#641-packageismetadataonlypackage-method) - - IsMetadataOnlyPackage checks if package contains only metadata files. -- **`Package.ListSymlinks`** - [Package.ListSymlinks](api_metadata.md#8534-packagelistsymlinks-method) - - Package.ListSymlinks Returns *PackageError on failure. -- **`Package.loadPathMetadata`** - [Package.loadPathMetadata](api_basic_operations.md#322-packageloadpathmetadata-method) - - loadPathMetadata loads path metadata from special files Returns *PackageError on failure. -- **`Package.loadSpecialMetadataFiles`** - [Package.loadSpecialMetadataFiles](api_basic_operations.md#321-packageloadspecialmetadatafiles-method) - - loadSpecialMetadataFiles loads all special metadata files Returns *PackageError on failure. -- **`Package.LoadSymlinkMetadataFile`** - [Package.LoadSymlinkMetadataFile](api_metadata.md#8537-packageloadsymlinkmetadatafile-method) - - Package.LoadSymlinkMetadataFile Returns *PackageError on failure. -- **`Package.RefreshPackageInfo`** - [Package.RefreshPackageInfo](api_metadata.md#742-packagerefreshpackageinfo-method) - - RefreshPackageInfo refreshes package information from the file on-disk Returns *PackageError on failure. -- **`Package.RemoveDirectoryMetadata`** - [Package.RemoveDirectoryMetadata](api_metadata.md#82110-packageremovedirectorymetadata-method) - - RemoveDirectoryMetadata removes directory path metadata (metadata-only, does not remove files) Returns *PackageError on failure. - **`Package.RemoveIndexFile`** - [Package.RemoveIndexFile](api_metadata.md#534-packageremoveindexfile-method) - RemoveIndexFile removes the package index Returns *PackageError on failure. - **`Package.RemoveManifestFile`** - [Package.RemoveManifestFile](api_metadata.md#524-packageremovemanifestfile-method) - RemoveManifestFile removes the package manifest Returns *PackageError on failure. - **`Package.RemoveMetadataFile`** - [Package.RemoveMetadataFile](api_metadata.md#514-packageremovemetadatafile-method) - RemoveMetadataFile removes the package metadata file Returns *PackageError on failure. -- **`Package.RemovePathMetadata`** - [Package.RemovePathMetadata](api_metadata.md#8214-packageremovepathmetadata-method) - - RemovePathMetadata removes a path metadata entry by path Returns *PackageError on failure. - **`Package.RemoveSignatureFile`** - [Package.RemoveSignatureFile](api_metadata.md#544-packageremovesignaturefile-method) - RemoveSignatureFile removes the signature file Returns *PackageError on failure. - **`Package.RemoveSpecialFile`** - [Package.RemoveSpecialFile](api_metadata.md#553-packageremovespecialfile-method) - RemoveSpecialFile removes a special file by type Returns *PackageError on failure. - **`Package.RemoveSpecialMetadataFile`** - [Package.RemoveSpecialMetadataFile](api_metadata.md#82116-packageremovespecialmetadatafile-method) - RemoveSpecialMetadataFile removes a special metadata file Returns *PackageError on failure. -- **`Package.RemoveSymlink`** - [Package.RemoveSymlink](api_metadata.md#8532-packageremovesymlink-method) - - Package.RemoveSymlink Returns *PackageError on failure. -- **`Package.SavePathMetadataFile`** - [Package.SavePathMetadataFile](api_metadata.md#8351-packagesavepathmetadatafile-method) - - SavePathMetadataFile creates and saves the path metadata file. -- **`Package.SaveSymlinkMetadataFile`** - [Package.SaveSymlinkMetadataFile](api_metadata.md#8536-packagesavesymlinkmetadatafile-method) - - Package.SaveSymlinkMetadataFile Returns *PackageError on failure. -- **`Package.SetAppID`** - [Package.SetAppID](api_metadata.md#21-packagesetappid-method) - - SetAppID sets or updates the package AppID Returns *PackageError on failure. -- **`Package.SetComment`** - [Package.SetComment](api_metadata.md#111-packagesetcomment-method) - - SetComment sets or updates the package comment Returns *PackageError on failure. +- **`Package.UpdateIndexFile`** - [Package.UpdateIndexFile](api_metadata.md#533-packageupdateindexfile-method) + - UpdateIndexFile updates the package index Returns *PackageError on failure. +- **`Package.UpdateManifestFile`** - [Package.UpdateManifestFile](api_metadata.md#523-packageupdatemanifestfile-method) + - UpdateManifestFile updates the package manifest Returns *PackageError on failure. +- **`Package.UpdateMetadataFile`** - [Package.UpdateMetadataFile](api_metadata.md#513-packageupdatemetadatafile-method) + - UpdateMetadataFile updates the package metadata file Returns *PackageError on failure. +- **`Package.UpdateSignatureFile`** - [Package.UpdateSignatureFile](api_metadata.md#543-packageupdatesignaturefile-method) + - UpdateSignatureFile updates the signature file Returns *PackageError on failure. +- **`Package.UpdateSpecialMetadataFile`** - [Package.UpdateSpecialMetadataFile](api_metadata.md#82115-packageupdatespecialmetadatafile-method) + - UpdateSpecialMetadataFile updates an existing special metadata file Returns *PackageError on failure. +- **`Package.UpdateSpecialMetadataFlags`** - [Package.UpdateSpecialMetadataFlags](api_metadata.md#8352-packageupdatespecialmetadataflags-method) + - UpdateSpecialMetadataFlags updates package header flags based on special files. + +### 1.7 Package Path Metadata Methods + +- **`Package.AddDirectoryMetadata`** - [Package.AddDirectoryMetadata](api_metadata.md#8219-packageadddirectorymetadata-method) + - AddDirectoryMetadata adds directory path metadata (metadata-only, does not add files) Returns *PackageError on failure. +- **`Package.AddPathMetadata`** - [Package.AddPathMetadata](api_metadata.md#8213-packageaddpathmetadata-method) + - AddPathMetadata adds a new path metadata entry to the package Returns *PackageError on failure. +- **`Package.GetFilePathAssociation`** - [Package.GetFilePathAssociation](api_metadata.md#8511-packagegetfilepathassociation-method) + - Package.GetFilePathAssociation File-path association query methods (Package-level) These methods work with path strings to find and return associated structs. +- **`Package.GetFilesInPath`** - [Package.GetFilesInPath](api_metadata.md#8512-packagegetfilesinpath-method) + - GetFilesInPath returns all file entries within the specified path. +- **`Package.GetPathConflicts`** - [Package.GetPathConflicts](api_metadata.md#82113-packagegetpathconflicts-method) + - GetPathConflicts returns a list of paths with conflicting metadata Returns *PackageError on failure. +- **`Package.GetPathFiles`** - [Package.GetPathFiles](api_metadata.md#8513-packagegetpathfiles-method) + - GetPathFiles returns all file entries associated with the specified path. +- **`Package.GetPathMetadata`** - [Package.GetPathMetadata](api_metadata.md#8211-packagegetpathmetadata-method) + - GetPathMetadata retrieves all path metadata entries from the package Returns *PackageError on failure. +- **`Package.GetPathStats`** - [Package.GetPathStats](api_metadata.md#8522-packagegetpathstats-method) + - GetPathStats returns statistics for all paths in the package. +- **`Package.GetPathTree`** - [Package.GetPathTree](api_metadata.md#8521-packagegetpathtree-method) + - Package.GetPathTree Path hierarchy analysis. +- **`Package.RemoveDirectoryMetadata`** - [Package.RemoveDirectoryMetadata](api_metadata.md#82110-packageremovedirectorymetadata-method) + - RemoveDirectoryMetadata removes directory path metadata (metadata-only, does not remove files) Returns *PackageError on failure. +- **`Package.RemovePathMetadata`** - [Package.RemovePathMetadata](api_metadata.md#8214-packageremovepathmetadata-method) + - RemovePathMetadata removes a path metadata entry by path Returns *PackageError on failure. - **`Package.SetDestPath`** - [Package.SetDestPath](api_metadata.md#8216-packagesetdestpath-method) - SetDestPath sets destination extraction directory overrides for a stored path. - This is a pure in-memory operation. @@ -334,12 +299,8 @@ Use this index to quickly locate specific API elements across the documentation. - If storedPath does not begin with "/", the implementation MUST prefix "/" before matching or creating entries. - If no PathMetadataEntry exists for storedPath, SetDestPath MUST create one. - The new entry type MUST be inferred from storedPath. -- **`Package.SetPackageIdentity`** - [Package.SetPackageIdentity](api_metadata.md#41-packagesetpackageidentity-method) - - SetPackageIdentity sets both VendorID and AppID Returns *PackageError on failure. - **`Package.SetPathMetadata`** - [Package.SetPathMetadata](api_metadata.md#8212-packagesetpathmetadata-method) - SetPathMetadata replaces all path metadata entries in the package Returns *PackageError on failure. -- **`Package.SetVendorID`** - [Package.SetVendorID](api_metadata.md#31-packagesetvendorid-method) - - SetVendorID sets or updates the package VendorID Returns *PackageError on failure. - **`Package.TargetExists`** - [Package.TargetExists](api_metadata.md#8542-packagetargetexists-method) - TargetExists checks if a path exists as FileEntry or directory PathMetadataEntry. - Parameter: ctx: Context for cancellation and timeout. @@ -348,27 +309,67 @@ Use this index to quickly locate specific API elements across the documentation. - Return: false otherwise. - **`Package.UpdateDirectoryMetadata`** - [Package.UpdateDirectoryMetadata](api_metadata.md#82111-packageupdatedirectorymetadata-method) - UpdateDirectoryMetadata updates directory path metadata (metadata-only, does not modify files) Returns *PackageError on failure. -- **`Package.UpdateFile`** - [Package.UpdateFile](api_file_mgmt_updates.md#11-packageupdatefile-method) - - UpdateFile updates file content and metadata in the package The new file data is read from the sourceFilePath on the filesystem. - - The storedPath identifies which file in the package to update. - **`Package.UpdateFileMetadata`** - [Package.UpdateFileMetadata](api_file_mgmt_updates.md#13-packageupdatefilemetadata-method) - UpdateFileMetadata updates file metadata without changing content. -- **`Package.UpdateIndexFile`** - [Package.UpdateIndexFile](api_metadata.md#533-packageupdateindexfile-method) - - UpdateIndexFile updates the package index Returns *PackageError on failure. -- **`Package.UpdateManifestFile`** - [Package.UpdateManifestFile](api_metadata.md#523-packageupdatemanifestfile-method) - - UpdateManifestFile updates the package manifest Returns *PackageError on failure. -- **`Package.UpdateMetadataFile`** - [Package.UpdateMetadataFile](api_metadata.md#513-packageupdatemetadatafile-method) - - UpdateMetadataFile updates the package metadata file Returns *PackageError on failure. - **`Package.UpdatePathMetadata`** - [Package.UpdatePathMetadata](api_metadata.md#8215-packageupdatepathmetadata-method) - UpdatePathMetadata updates an existing path metadata entry Returns *PackageError on failure. -- **`Package.UpdateSignatureFile`** - [Package.UpdateSignatureFile](api_metadata.md#543-packageupdatesignaturefile-method) - - UpdateSignatureFile updates the signature file Returns *PackageError on failure. -- **`Package.UpdateSpecialMetadataFile`** - [Package.UpdateSpecialMetadataFile](api_metadata.md#82115-packageupdatespecialmetadatafile-method) - - UpdateSpecialMetadataFile updates an existing special metadata file Returns *PackageError on failure. -- **`Package.UpdateSpecialMetadataFlags`** - [Package.UpdateSpecialMetadataFlags](api_metadata.md#8352-packageupdatespecialmetadataflags-method) - - UpdateSpecialMetadataFlags updates package header flags based on special files. + +### 1.8 Package Symlink Methods + +- **`Package.AddSymlink`** - [Package.AddSymlink](api_metadata.md#8531-packageaddsymlink-method) + - AddSymlink adds a symbolic link to the package. + - Parameter: symlink: SymlinkEntry to add. + - Return: Error if validation fails or symlink cannot be added. + - Validation: Calls ValidateSymlinkPaths() to ensure paths are valid and within package root. + - Validation: Verifies target exists as FileEntry or PathMetadataEntry directory. + - Validation: Returns ErrTypeValidation, ErrTypeSecurity, or ErrTypeNotFound on validation failure Returns *PackageError on failure. +- **`Package.ConvertAllPathsToSymlinks`** - [Package.ConvertAllPathsToSymlinks](api_file_mgmt_updates.md#1712-packageconvertallpathstosymlinks-method) + - ConvertAllPathsToSymlinks converts all eligible multi-path entries to use symlinks. +- **`Package.ConvertAllSymlinksToHardLinks`** - [Package.ConvertAllSymlinksToHardLinks](api_file_mgmt_updates.md#1722-packageconvertallsymlinkstohardlinks-method) + - ConvertAllSymlinksToHardLinks converts all symlinks back to hard links. +- **`Package.ConvertPathsToSymlinks`** - [Package.ConvertPathsToSymlinks](api_file_mgmt_updates.md#1711-packageconvertpathstosymlinks-method) + - ConvertPathsToSymlinks converts duplicate paths on a FileEntry to symlinks. + - Parameter: ctx: Context for cancellation and timeout. + - Parameter: entry: FileEntry with multiple paths (PathCount > 1). + - Parameter: options: Path-to-symlink conversion options (primary path selection, metadata preservation). + - Return: Updated FileEntry with single path. + - Return: Slice of created SymlinkEntry objects. +- **`Package.ConvertSymlinksToHardLinks`** - [Package.ConvertSymlinksToHardLinks](api_file_mgmt_updates.md#1721-packageconvertsymlinkstohardlinks-method) + - ConvertSymlinksToHardLinks converts symlinks back to hard links (reverse operation). + - Parameter: ctx: Context for cancellation and timeout. + - Parameter: symlinkEntry: SymlinkEntry to convert back to hard link. + - Return: Updated FileEntry with additional path added. + - Return: Error if conversion fails. + - Behavior: Removes SymlinkEntry from package. +- **`Package.GetSymlink`** - [Package.GetSymlink](api_metadata.md#8533-packagegetsymlink-method) + - Package.GetSymlink Returns *PackageError on failure. +- **`Package.ListSymlinks`** - [Package.ListSymlinks](api_metadata.md#8534-packagelistsymlinks-method) + - Package.ListSymlinks Returns *PackageError on failure. +- **`Package.LoadSymlinkMetadataFile`** - [Package.LoadSymlinkMetadataFile](api_metadata.md#8537-packageloadsymlinkmetadatafile-method) + - Package.LoadSymlinkMetadataFile Returns *PackageError on failure. +- **`Package.RemoveSymlink`** - [Package.RemoveSymlink](api_metadata.md#8532-packageremovesymlink-method) + - Package.RemoveSymlink Returns *PackageError on failure. - **`Package.UpdateSymlink`** - [Package.UpdateSymlink](api_metadata.md#8535-packageupdatesymlink-method) - Package.UpdateSymlink Returns *PackageError on failure. + +### 1.9 Package Metadata-Only Methods + +- **`Package.AddMetadataOnlyFile`** - [Package.AddMetadataOnlyFile](api_metadata.md#642-packageaddmetadataonlyfile-method) + - AddMetadataOnlyFile adds a special metadata file to a metadata-only package Returns *PackageError on failure. +- **`Package.GetMetadataOnlyFiles`** - [Package.GetMetadataOnlyFiles](api_metadata.md#643-packagegetmetadataonlyfiles-method) + - GetMetadataOnlyFiles returns all metadata files in the package Returns *PackageError on failure. +- **`Package.IsMetadataOnlyPackage`** - [Package.IsMetadataOnlyPackage](api_metadata.md#641-packageismetadataonlypackage-method) + - IsMetadataOnlyPackage checks if package contains only metadata files. + +### 1.10 Package Info Methods + +- **`Package.GetPackageInfo`** - [Package.GetPackageInfo](api_metadata.md#741-packagegetpackageinfo-method) + - GetPackageInfo returns comprehensive package information. +- **`Package.RefreshPackageInfo`** - [Package.RefreshPackageInfo](api_metadata.md#742-packagerefreshpackageinfo-method) + - RefreshPackageInfo refreshes package information from the file on-disk Returns *PackageError on failure. + +### 1.11 Package Metadata Validation Methods + - **`Package.ValidateMetadataOnlyIntegrity`** - [Package.ValidateMetadataOnlyIntegrity](api_metadata.md#644-packagevalidatemetadataonlyintegrity-method) - ValidateMetadataOnlyIntegrity validates metadata-only package integrity Returns *PackageError on failure. - **`Package.ValidateMetadataOnlyPackage`** - [Package.ValidateMetadataOnlyPackage](api_metadata.md#645-packagevalidatemetadataonlypackage-method) @@ -393,7 +394,20 @@ Use this index to quickly locate specific API elements across the documentation. - Return: Error if validation fails Validation performed:. - Return: Both paths are package-relative (start with "/"). -### 1.5 Package Compression Methods +### 1.12 Package Metadata Internal Methods + +- **`Package.SavePathMetadataFile`** - [Package.SavePathMetadataFile](api_metadata.md#8351-packagesavepathmetadatafile-method) + - SavePathMetadataFile creates and saves the path metadata file. +- **`Package.SaveSymlinkMetadataFile`** - [Package.SaveSymlinkMetadataFile](api_metadata.md#8536-packagesavesymlinkmetadatafile-method) + - Package.SaveSymlinkMetadataFile Returns *PackageError on failure. +- **`Package.loadPathMetadata`** - [Package.loadPathMetadata](api_basic_operations.md#322-packageloadpathmetadata-method) + - loadPathMetadata loads path metadata from special files Returns *PackageError on failure. +- **`Package.loadSpecialMetadataFiles`** - [Package.loadSpecialMetadataFiles](api_basic_operations.md#321-packageloadspecialmetadatafiles-method) + - loadSpecialMetadataFiles loads all special metadata files Returns *PackageError on failure. +- **`Package.updateFilePathAssociations`** - [Package.updateFilePathAssociations](api_basic_operations.md#323-packageupdatefilepathassociations-method) + - updateFilePathAssociations links files to their path metadata Returns *PackageError on failure. + +### 1.13 Package Compression Methods - **`Package.CanCompressPackage`** - [Package.CanCompressPackage](api_package_compression.md#7282-packagecancompresspackage-method) - CanCompressPackage checks if package can be compressed (not signed). @@ -405,8 +419,6 @@ Use this index to quickly locate specific API elements across the documentation. - CompressPackage compresses package content in memory Compresses file entries and data separately using LZ4 for metadata and specified type for data Compresses file index with LZ4 Creates metadata index for fast access (NOT header, metadata index, comment, or signatures) Signed packages cannot be compressed Returns *PackageError on failure. - **`Package.CompressPackageConcurrent`** - [Package.CompressPackageConcurrent](api_package_compression.md#831-packagecompresspackageconcurrent-method) - CompressPackageConcurrent compresses package content using worker pool Returns *PackageError on failure. -- **`Package.compressPackageContent`** - [Package.compressPackageContent](api_package_compression.md#731-packagecompresspackagecontent-method) - - Package.compressPackageContent Internal compression methods (used by CompressPackage and Write) Returns *PackageError on failure. - **`Package.CompressPackageFile`** - [Package.CompressPackageFile](api_package_compression.md#61-packagecompresspackagefile-method) - CompressPackageFile compresses package content and writes to specified path Compresses file entries + data + index (NOT header, comment, or signatures) Signed packages cannot be compressed Returns *PackageError on failure. - **`Package.CompressPackageStream`** - [Package.CompressPackageStream](api_package_compression.md#51-packagecompresspackagestream-method) @@ -419,14 +431,14 @@ Use this index to quickly locate specific API elements across the documentation. - DecompressPackage decompresses the package in memory Decompresses all compressed content Returns *PackageError on failure. - **`Package.DecompressPackageConcurrent`** - [Package.DecompressPackageConcurrent](api_package_compression.md#832-packagedecompresspackageconcurrent-method) - DecompressPackageConcurrent decompresses package content using worker pool Returns *PackageError on failure. -- **`Package.decompressPackageContent`** - [Package.decompressPackageContent](api_package_compression.md#732-packagedecompresspackagecontent-method) - - Package.decompressPackageContent Returns *PackageError on failure. - **`Package.DecompressPackageFile`** - [Package.DecompressPackageFile](api_package_compression.md#62-packagedecompresspackagefile-method) - DecompressPackageFile decompresses the package and writes to specified path Decompresses all compressed content and writes uncompressed package Returns *PackageError on failure. - **`Package.DecompressPackageStream`** - [Package.DecompressPackageStream](api_package_compression.md#52-packagedecompresspackagestream-method) - DecompressPackageStream decompresses large package content using streaming Uses streaming to manage memory efficiently for large packages Returns *PackageError on failure. - **`Package.GetFileCompressionInfo`** - [Package.GetFileCompressionInfo](api_file_mgmt_compression.md#35-packagegetfilecompressioninfo-method) - GetFileCompressionInfo gets compression information for a file by path This is a convenience wrapper that looks up the FileEntry and calls GetCompressionInfo. +- **`Package.GetMetadataIndexOffset`** - [Package.GetMetadataIndexOffset](api_package_compression.md#7292-packagegetmetadataindexoffset-method) + - GetMetadataIndexOffset returns the offset to metadata index Returns fixed offset 112 bytes (PackageHeaderSize) when compression enabled Returns *PackageError if package is not compressed (no metadata index). - **`Package.GetPackageCompressedSize`** - [Package.GetPackageCompressedSize](api_package_compression.md#727-packagegetpackagecompressedsize-method) - GetPackageCompressedSize returns the compressed size Returns *PackageError if package is not compressed. - **`Package.GetPackageCompressionInfo`** - [Package.GetPackageCompressionInfo](api_package_compression.md#722-packagegetpackagecompressioninfo-method) @@ -437,6 +449,8 @@ Use this index to quickly locate specific API elements across the documentation. - GetPackageCompressionType returns the package compression type Returns compression type from header flags bits 15-8 Returns *PackageError if package is not compressed. - **`Package.GetPackageOriginalSize`** - [Package.GetPackageOriginalSize](api_package_compression.md#726-packagegetpackageoriginalsize-method) - GetPackageOriginalSize returns the original size before compression Returns *PackageError if package is not compressed. +- **`Package.HasMetadataIndex`** - [Package.HasMetadataIndex](api_package_compression.md#7291-packagehasmetadataindex-method) + - HasMetadataIndex checks if package has metadata index (compression enabled) Returns true if header flags bits 15-8 != 0. - **`Package.IsPackageCompressed`** - [Package.IsPackageCompressed](api_package_compression.md#723-packageispackagecompressed-method) - IsPackageCompressed checks if the package is compressed Checks header flags bits 15-8 for compression type. - **`Package.ListCompressedFiles`** - [Package.ListCompressedFiles](api_file_mgmt_queries.md#421-packagelistcompressedfiles-method) @@ -445,12 +459,34 @@ Use this index to quickly locate specific API elements across the documentation. - SetPackageCompressionType sets the package compression type (without compressing) Returns *PackageError on failure. - **`Package.ValidateCompressionData`** - [7.2.10.3 Package.ValidateCompressionData Method](api_package_compression.md#72103-packagevalidatecompressiondata-method) - Package.ValidateCompressionData Returns *PackageError on failure. +- **`Package.compressPackageContent`** - [Package.compressPackageContent](api_package_compression.md#731-packagecompresspackagecontent-method) + - Package.compressPackageContent Internal compression methods (used by CompressPackage and Write) Returns *PackageError on failure. +- **`Package.decompressPackageContent`** - [Package.decompressPackageContent](api_package_compression.md#732-packagedecompresspackagecontent-method) + - Package.decompressPackageContent Returns *PackageError on failure. + +### 1.14 Package Path and Configuration Methods + +- **`Package.ClearSessionBase`** - [Package.ClearSessionBase](api_basic_operations.md#196-packageclearsessionbase-method) + - ClearSessionBase clears the current session base path. +- **`Package.GetSessionBase`** - [Package.GetSessionBase](api_basic_operations.md#195-packagegetsessionbase-method) + - GetSessionBase returns the current session base path Returns empty string if no session base has been established. +- **`Package.SetSessionBase`** - [Package.SetSessionBase](api_basic_operations.md#194-packagesetsessionbase-method) + - SetSessionBase explicitly sets the package-level session base path This method allows setting the session base before any file operations Returns *PackageError on failure (e.g., invalid path format). +- **`Package.SetTargetPath`** - [Package.SetTargetPath](api_basic_operations.md#8-packagesettargetpath-method) + - SetTargetPath changes the package's target write path Returns *PackageError on failure. -### 1.6 Package Path and Configuration Methods +### 1.15 Package File Encryption Methods -This subsection groups package path and runtime configuration methods (for example, target path and session base management). +- **`Package.DecryptFile`** - [Package.DecryptFile](api_security.md#462-packagedecryptfile-method) + - DecryptFile decrypts a file in the package. +- **`Package.EncryptFile`** - [Package.EncryptFile](api_security.md#461-packageencryptfile-method) + - EncryptFile encrypts a file in the package. +- **`Package.GetFileEncryptionInfo`** - [Package.GetFileEncryptionInfo](api_security.md#464-packagegetfileencryptioninfo-method) + - GetFileEncryptionInfo returns encryption information for a file. +- **`Package.ValidateFileEncryption`** - [Package.ValidateFileEncryption](api_security.md#463-packagevalidatefileencryption-method) + - ValidateFileEncryption validates encryption state and metadata for a file. -### 1.7 Package Signature Management Methods +### 1.16 Package Signature Management Methods - **`Package.AddSignature`** - [Package.AddSignature](api_signatures.md#111-packageaddsignature-method) - AddSignature adds a new digital signature (appends incrementally) LOW-LEVEL: Use when you have pre-computed signature data Automatically sets the "Has signatures" bit (Flags Bit 0) and SignatureOffset if this is the first signature Ensures signature integrity by validating header state before signing Returns *PackageError on failure. @@ -501,35 +537,26 @@ This subsection groups package path and runtime configuration methods (for examp - **`Package.ValidateX509SignatureWithChain`** - [Package.ValidateX509SignatureWithChain](api_signatures.md#2732-packagevalidatex509signaturewithchain-method) - Package.ValidateX509SignatureWithChain Returns *PackageError on failure. -### 1.8 Package Other Methods +### 1.17 Package Write Methods -- **`Package.SafeWrite`** - [Package.SafeWrite](api_writing.md#11-packagesafewrite-method) - - SafeWrite writes the package atomically to the configured target path. - **`Package.FastWrite`** - [Package.FastWrite](api_writing.md#21-packagefastwrite-method) - FastWrite performs in-place updates to an existing package file. +- **`Package.SafeWrite`** - [Package.SafeWrite](api_writing.md#11-packagesafewrite-method) + - SafeWrite writes the package atomically to the configured target path. - **`Package.Write`** - [Package.Write](api_writing.md#533-packagewrite-method) - Write selects the appropriate write strategy (SafeWrite or FastWrite) based on package state. -- **`Package.EncryptFile`** - [Package.EncryptFile](api_security.md#461-packageencryptfile-method) - - EncryptFile encrypts a file in the package. -- **`Package.DecryptFile`** - [Package.DecryptFile](api_security.md#462-packagedecryptfile-method) - - DecryptFile decrypts a file in the package. -- **`Package.ValidateFileEncryption`** - [Package.ValidateFileEncryption](api_security.md#463-packagevalidatefileencryption-method) - - ValidateFileEncryption validates encryption state and metadata for a file. -- **`Package.GetFileEncryptionInfo`** - [Package.GetFileEncryptionInfo](api_security.md#464-packagegetfileencryptioninfo-method) - - GetFileEncryptionInfo returns encryption information for a file. -### 1.9 Package Helper Functions +### 1.18 Package Other Methods + +- **`Package.ReadFile`** - [Package.ReadFile](api_core.md#122-packagereadfile-method) + - ReadFile reads file content from the package, applying decryption and decompression. +- **`readOnlyPackage.readOnlyError`** - [readOnlyPackage.readOnlyError](api_basic_operations.md#114-readonlypackagereadonlyerror-method) + - readOnlyError creates a structured security error for read-only enforcement. + +### 1.19 Package Helper Functions - **`NewPackage`** - [6. NewPackage Function](api_basic_operations.md#6-newpackage-function) - NewPackage creates a new empty package. -- **`NewPackageComment`** - [1.3.5 NewPackageComment Function](api_metadata.md#135-newpackagecomment-function) - - NewPackageComment creates and returns a new PackageComment with zero values. -- **`NewPackageError`** - [10.5.1 NewPackageError Function](api_core.md#1051-newpackageerror-function) - - NewPackageError creates a structured error with type-safe context All errors must include typed context for type safety. -- **`NewPackageHeader`** - [2.8.2 NewPackageHeader Function](package_file_format.md#282-newpackageheader-function) - - NewPackageHeader creates and returns a new PackageHeader with default values. -- **`NewPackageInfo`** - [7.1.2 NewPackageInfo Function](api_metadata.md#712-newpackageinfo-function) - - NewPackageInfo creates a new PackageInfo with default values. - **`NewPackageWithOptions`** - [7. NewPackageWithOptions Function](api_basic_operations.md#7-newpackagewithoptions-function) - NewPackageWithOptions creates a new package with specified configuration options Returns *PackageError on failure. - **`NormalizePackagePath`** - [Normalizepackagepath](api_core.md#21-path-normalization-rules) @@ -563,144 +590,122 @@ This subsection groups package path and runtime configuration methods (for examp - Return: warnings: Non-fatal portability warnings. - Error: ErrTypeValidation when the path exceeds the hard limit. -## 2. PackageReader Interface Types - -- **`PackageReader`** - [1.2 PackageReader Interface](api_core.md#12-packagereader-interface) - - PackageReader defines the read-only interface for opened packages. - -### 2.1 PackageReader Methods - -This subsection groups `PackageReader` methods by operational category. - -#### 2.1.1 PackageReader Read Operations - -This subsection groups file read and streaming-oriented `PackageReader` operations. - -#### 2.1.2 PackageReader Query Operations - -This subsection groups query and inspection `PackageReader` operations. - -#### 2.1.3 PackageReader Other Methods - -This subsection groups remaining `PackageReader` operations not classified above. - -### 2.2 PackageReader Helper Functions - -This subsection groups helper functions related to `PackageReader` usage. - -## 3. PackageWriter Interface Types - -- **`PackageWriter`** - [1.3 PackageWriter Interface](api_core.md#13-packagewriter-interface) - - PackageWriter defines the write interface for persisting a package to disk. - -### 3.1 PackageWriter Methods - -This subsection groups `PackageWriter` methods by operational category. - -#### 3.1.1 PackageWriter Write Operations - -This subsection groups write and persistence-oriented `PackageWriter` operations. - -#### 3.1.2 PackageWriter Other Methods - -This subsection groups remaining `PackageWriter` operations not classified above. - -### 3.2 PackageWriter Helper Functions - -This subsection groups helper functions related to `PackageWriter` usage. - -## 4. FileEntry Types +## 2. FileEntry Types +- **`AddFileOptions`** - [2.8 AddFileOptions Struct](api_file_mgmt_addition.md#28-addfileoptions-struct) + - AddFileOptions configures file addition behavior (path determination, metadata preservation, and processing options). +- **`ExtractPathOptions`** - [2. ExtractPathOptions Struct](api_file_mgmt_extraction.md#2-extractpathoptions-struct) + - ExtractPathOptions configures filesystem extraction behavior. - **`FileEntry`** - [1.1 FileEntry Structure Definition](api_file_mgmt_file_entry.md#11-fileentry-structure-definition) - FileEntry represents a FileEntry in the package with complete metadata. +- **`FileInfo`** - [1.2.4 FileInfo Structure](api_core.md#124-fileinfo-structure) + - FileInfo provides lightweight file information for listing operations. +- **`FileMetadataUpdate`** - [1.3.3 FileMetadataUpdate Structure](api_file_mgmt_updates.md#133-filemetadataupdate-structure) + - FileMetadataUpdate contains metadata updates for a FileEntry. +- **`FileSource`** - [16. FileSource Structure](api_file_mgmt_file_entry.md#16-filesource-structure) + - FileSource represents a source location for file data (original or intermediate). - **`HashEntry`** - [11. HashEntry Struct](api_file_mgmt_file_entry.md#11-hashentry-struct) - HashEntry represents a hash with type and purpose. -- **`HashType`** - [12. HashType Type](api_file_mgmt_file_entry.md#12-hashtype-type) - - HashType represents hash algorithm types. - **`HashPurpose`** - [13. HashPurpose Type](api_file_mgmt_file_entry.md#13-hashpurpose-type) - HashPurpose represents the purpose for a hash (for example, deduplication vs integrity). -- **`ProcessingState`** - [15. ProcessingState Type](api_file_mgmt_file_entry.md#15-processingstate-type) - - ProcessingState defines the current state of file data transformations. -- **`FileSource`** - [16. FileSource Structure](api_file_mgmt_file_entry.md#16-filesource-structure) - - FileSource represents a source location for file data (original or intermediate). +- **`HashType`** - [12. HashType Type](api_file_mgmt_file_entry.md#12-hashtype-type) + - HashType represents hash algorithm types. - **`OptionalData`** - [17. OptionalData Structure](api_file_mgmt_file_entry.md#17-optionaldata-structure) - OptionalData represents structured optional data for a FileEntry. - **`OptionalDataType`** - [18. OptionalDataType Type](api_file_mgmt_file_entry.md#18-optionaldatatype-type) - OptionalDataType identifies optional data payload types. +- **`ProcessingState`** - [15. ProcessingState Type](api_file_mgmt_file_entry.md#15-processingstate-type) + - ProcessingState defines the current state of file data transformations. +- **`RemoveDirectoryOptions`** - [4.4 RemoveDirectoryOptions Struct](api_file_mgmt_removal.md#44-removedirectoryoptions-struct) + - RemoveDirectoryOptions configures directory removal behavior. +- **`Tag`** - [19.1 Tag Struct](api_file_mgmt_file_entry.md#191-tag-struct) + - Tag represents a type-safe tag with a specific value type. +- **`TagValueType`** - [14. TagValueType Type](api_file_mgmt_file_entry.md#14-tagvaluetype-type) + - TagValueType represents the type of a tag value. +- **`TransformPipeline`** - [2.2 TransformPipeline Structure](api_file_mgmt_transform_pipelines.md#22-transformpipeline-structure) + - TransformPipeline tracks a multi-stage transformation pipeline for large or multi-step operations. +- **`TransformStage`** - [2.3 TransformStage Structure](api_file_mgmt_transform_pipelines.md#23-transformstage-structure) + - TransformStage represents a single transformation stage. +- **`TransformType`** - [2.4 TransformType Type](api_file_mgmt_transform_pipelines.md#24-transformtype-type) + - TransformType identifies the type of a transformation stage (compress, encrypt, etc.). -### 4.1 FileEntry Methods - -This subsection groups `FileEntry` methods by operational category. - -#### 4.1.1 FileEntry Data Management Methods +### 2.1 FileEntry Query Methods -- **`FileEntry.CleanupTempFile`** - [FileEntry.CleanupTempFile](api_file_mgmt_file_entry.md#425-fileentrycleanuptempfile-method) - - CleanupTempFile removes temporary files. -- **`FileEntry.CopyCurrentToOriginal`** - [FileEntry.CopyCurrentToOriginal](api_file_mgmt_file_entry.md#448-fileentrycopycurrenttooriginal-method) - - CopyCurrentToOriginal saves current source as original (for transformations). +- **`FileEntry.FixedSize`** - [FileEntry.FixedSize](api_file_mgmt_file_entry.md#663-fileentryfixedsize-method) + - FixedSize returns the fixed-size portion of the FileEntry size in bytes. +- **`FileEntry.GetCompressionInfo`** - [FileEntry.GetCompressionInfo](api_file_mgmt_file_entry.md#83-fileentrygetcompressioninfo-method) + - GetCompressionInfo returns compression details for the entry. - **`FileEntry.GetCurrentSource`** - [FileEntry.GetCurrentSource](api_file_mgmt_file_entry.md#442-fileentrygetcurrentsource-method) - GetCurrentSource returns the current data source Returns nil if no current source is set. -- **`FileEntry.GetData`** - [FileEntry.GetData](api_file_mgmt_file_entry.md#413-fileentrygetdata-method) - - GetData returns the in-memory file data. - - Returns *PackageError on failure. +- **`FileEntry.GetDirectoryDepth`** - [FileEntry.GetDirectoryDepth](api_file_mgmt_file_entry.md#510-fileentrygetdirectorydepth-method) + - GetDirectoryDepth returns the directory depth for the primary path. - **`FileEntry.GetEncryptionType`** - [FileEntry.GetEncryptionType](api_file_mgmt_file_entry.md#73-fileentrygetencryptiontype-method) - GetEncryptionType returns the encryption type used for this file. +- **`FileEntry.GetFileID`** - [FileEntry.GetFileID](api_file_mgmt_file_entry.md#58-fileentrygetfileid-method) + - GetFileID returns the FileEntry unique identifier. - **`FileEntry.GetOriginalSource`** - [FileEntry.GetOriginalSource](api_file_mgmt_file_entry.md#444-fileentrygetoriginalsource-method) - GetOriginalSource returns the original data source. - Returns nil if no original source is tracked (e.g., new files). -- **`FileEntry.GetPrimaryPath`** - [FileEntry.GetPrimaryPath](api_file_mgmt_file_entry.md#53-fileentrygetprimarypath-method) - - GetPrimaryPath returns the primary path in display format (no leading slash). - - The returned string MUST use forward slashes as separators. - - For platform-specific filesystem display, convert manually or use path conversion utilities. - **`FileEntry.GetProcessingState`** - [FileEntry.GetProcessingState](api_file_mgmt_file_entry.md#431-fileentrygetprocessingstate-method) - GetProcessingState returns the current processing state. -- **`FileEntry.GetSymlinkPaths`** - [FileEntry.GetSymlinkPaths](api_file_mgmt_file_entry.md#52-fileentrygetsymlinkpaths-method) - - GetSymlinkPaths returns all symlink paths associated with this FileEntry. - **`FileEntry.GetTransformPipeline`** - [FileEntry.GetTransformPipeline](api_file_mgmt_file_entry.md#452-fileentrygettransformpipeline-method) - GetTransformPipeline returns the current transformation pipeline. - Returns nil if no pipeline is active. -- **`FileEntry.AssociateWithPathMetadata`** - [FileEntry.AssociateWithPathMetadata](api_file_mgmt_file_entry.md#55-fileentryassociatewithpathmetadata-method) - - AssociateWithPathMetadata associates the FileEntry with a PathMetadataEntry. -- **`FileEntry.GetPathMetadataForPath`** - [FileEntry.GetPathMetadataForPath](api_file_mgmt_file_entry.md#56-fileentrygetpathmetadataforpath-method) - - GetPathMetadataForPath returns the PathMetadataEntry for a given stored path. -- **`FileEntry.IsCompressed`** - [FileEntry.IsCompressed](api_file_mgmt_file_entry.md#71-fileentryiscompressed-method) - - IsCompressed returns true if the file is compressed. -- **`FileEntry.GetCompressionInfo`** - [FileEntry.GetCompressionInfo](api_file_mgmt_file_entry.md#83-fileentrygetcompressioninfo-method) - - GetCompressionInfo returns compression details for the entry. - **`FileEntry.HasEncryptionKey`** - [FileEntry.HasEncryptionKey](api_file_mgmt_file_entry.md#72-fileentryhasencryptionkey-method) - HasEncryptionKey checks if the file has an encryption key set. - **`FileEntry.HasOriginalSource`** - [FileEntry.HasOriginalSource](api_file_mgmt_file_entry.md#446-fileentryhasoriginalsource-method) - - HasOriginalSource returns true if original source is tracked. -- **`FileEntry.HasSymlinks`** - [FileEntry.HasSymlinks](api_file_mgmt_file_entry.md#51-fileentryhassymlinks-method) - - HasSymlinks returns true if the FileEntry has any symlink paths. -- **`FileEntry.IsCurrentSourceTempFile`** - [FileEntry.IsCurrentSourceTempFile](api_file_mgmt_file_entry.md#445-fileentryiscurrentsourcetempfile-method) - - IsCurrentSourceTempFile returns true if current source is a temporary file. + - HasOriginalSource returns true if original source is tracked. +- **`FileEntry.IsCompressed`** - [FileEntry.IsCompressed](api_file_mgmt_file_entry.md#71-fileentryiscompressed-method) + - IsCompressed returns true if the file is compressed. - **`FileEntry.IsEncrypted`** - [FileEntry.IsEncrypted](api_file_mgmt_file_entry.md#74-fileentryisencrypted-method) - IsEncrypted checks if the file is encrypted. +- **`FileEntry.IsRootRelative`** - [FileEntry.IsRootRelative](api_file_mgmt_file_entry.md#511-fileentryisrootrelative-method) + - IsRootRelative returns true if the FileEntry paths are root-relative. +- **`FileEntry.ReadFrom`** - [FileEntry.ReadFrom](api_file_mgmt_file_entry.md#661-fileentryreadfrom-method) + - ReadFrom reads FileEntry metadata from an io.Reader. +- **`FileEntry.TotalSize`** - [FileEntry.TotalSize](api_file_mgmt_file_entry.md#665-fileentrytotalsize-method) + - TotalSize returns the total size of the FileEntry in bytes. +- **`FileEntry.VariableSize`** - [FileEntry.VariableSize](api_file_mgmt_file_entry.md#664-fileentryvariablesize-method) + - VariableSize returns the variable-size portion of the FileEntry size in bytes. + +### 2.2 FileEntry Data Methods + +- **`FileEntry.GetData`** - [FileEntry.GetData](api_file_mgmt_file_entry.md#413-fileentrygetdata-method) + - GetData returns the in-memory file data. + - Returns *PackageError on failure. - **`FileEntry.LoadData`** - [FileEntry.LoadData](api_file_mgmt_file_entry.md#101-fileentryloaddata-method) - LoadData loads the file data into memory. +- **`FileEntry.SetData`** - [FileEntry.SetData](api_file_mgmt_file_entry.md#414-fileentrysetdata-method) + - SetData sets the in-memory file data. +- **`FileEntry.UnloadData`** - [FileEntry.UnloadData](api_file_mgmt_file_entry.md#412-fileentryunloaddata-method) + - UnloadData unloads file data from memory. + +### 2.3 FileEntry Temp File Methods + +- **`FileEntry.CleanupTempFile`** - [FileEntry.CleanupTempFile](api_file_mgmt_file_entry.md#425-fileentrycleanuptempfile-method) + - CleanupTempFile removes temporary files. +- **`FileEntry.CreateTempFile`** - [FileEntry.CreateTempFile](api_file_mgmt_file_entry.md#421-fileentrycreatetempfile-method) + - CreateTempFile creates a temporary file for staging file data. +- **`FileEntry.IsCurrentSourceTempFile`** - [FileEntry.IsCurrentSourceTempFile](api_file_mgmt_file_entry.md#445-fileentryiscurrentsourcetempfile-method) + - IsCurrentSourceTempFile returns true if current source is a temporary file. +- **`FileEntry.ReadFromTempFile`** - [FileEntry.ReadFromTempFile](api_file_mgmt_file_entry.md#424-fileentryreadfromtempfile-method) + - ReadFromTempFile reads data from a temporary file. +- **`FileEntry.StreamToTempFile`** - [FileEntry.StreamToTempFile](api_file_mgmt_file_entry.md#422-fileentrystreamtotempfile-method) + - StreamToTempFile streams data to a temporary file Returns *PackageError on failure. +- **`FileEntry.WriteToTempFile`** - [FileEntry.WriteToTempFile](api_file_mgmt_file_entry.md#423-fileentrywritetotempfile-method) + - WriteToTempFile writes data to a temporary file. + - Returns *PackageError on failure. + +### 2.4 FileEntry Serialization Methods + - **`FileEntry.Marshal`** - [FileEntry.Marshal](api_file_mgmt_file_entry.md#613-fileentrymarshal-method) - Marshal marshals both FileEntry metadata and data. - Returns metadata and data as separate byte slices for flexible writing. - Returns *PackageError on failure. -- **`FileEntry.MarshalMeta`** - [FileEntry.MarshalMeta](api_file_mgmt_file_entry.md#611-fileentrymarshalmeta-method) - - MarshalMeta marshals FileEntry metadata to bytes. - **`FileEntry.MarshalData`** - [FileEntry.MarshalData](api_file_mgmt_file_entry.md#612-fileentrymarshaldata-method) - MarshalData marshals FileEntry data to bytes. -- **`FileEntry.ReadFromTempFile`** - [FileEntry.ReadFromTempFile](api_file_mgmt_file_entry.md#424-fileentryreadfromtempfile-method) - - ReadFromTempFile reads data from a temporary file. -- **`FileEntry.CreateTempFile`** - [FileEntry.CreateTempFile](api_file_mgmt_file_entry.md#421-fileentrycreatetempfile-method) - - CreateTempFile creates a temporary file for staging file data. -- **`FileEntry.ResolveAllSymlinks`** - [FileEntry.ResolveAllSymlinks](api_file_mgmt_file_entry.md#54-fileentryresolveallsymlinks-method) - - ResolveAllSymlinks resolves all symlink paths to their target paths. -- **`FileEntry.StreamToTempFile`** - [FileEntry.StreamToTempFile](api_file_mgmt_file_entry.md#422-fileentrystreamtotempfile-method) - - StreamToTempFile streams data to a temporary file Returns *PackageError on failure. -- **`FileEntry.UnloadData`** - [FileEntry.UnloadData](api_file_mgmt_file_entry.md#412-fileentryunloaddata-method) - - UnloadData unloads file data from memory. -- **`FileEntry.UnsetEncryptionKey`** - [FileEntry.UnsetEncryptionKey](api_file_mgmt_file_entry.md#94-fileentryunsetencryptionkey-method) - - UnsetEncryptionKey removes the encryption key from the file. -- **`FileEntry.ValidateSources`** - [FileEntry.ValidateSources](api_file_mgmt_file_entry.md#456-fileentryvalidatesources-method) - - ValidateSources validates CurrentSource, OriginalSource, and pipeline consistency Returns *PackageError if validation fails. +- **`FileEntry.MarshalMeta`** - [FileEntry.MarshalMeta](api_file_mgmt_file_entry.md#611-fileentrymarshalmeta-method) + - MarshalMeta marshals FileEntry metadata to bytes. - **`FileEntry.WriteDataTo`** - [FileEntry.WriteDataTo](api_file_mgmt_file_entry.md#622-fileentrywritedatato-method) - WriteDataTo writes the FileEntry data to a writer. - Implements efficient streaming for large files. @@ -713,27 +718,47 @@ This subsection groups `FileEntry` methods by operational category. - WriteTo writes both metadata and data to a writer. - Implements io.WriterTo interface. - Returns *PackageError on failure. -- **`FileEntry.WriteToTempFile`** - [FileEntry.WriteToTempFile](api_file_mgmt_file_entry.md#423-fileentrywritetotempfile-method) - - WriteToTempFile writes data to a temporary file. - - Returns *PackageError on failure. -#### 4.1.2 FileEntry Transformation Methods +### 2.5 FileEntry Path Methods + +- **`FileEntry.AssociateWithPathMetadata`** - [FileEntry.AssociateWithPathMetadata](api_file_mgmt_file_entry.md#55-fileentryassociatewithpathmetadata-method) + - AssociateWithPathMetadata associates the FileEntry with a PathMetadataEntry. +- **`FileEntry.GetParentPath`** - [FileEntry.GetParentPath](api_file_mgmt_file_entry.md#59-fileentrygetparentpath-method) + - GetParentPath returns the parent directory path for the FileEntry primary path. +- **`FileEntry.GetPathMetadataForPath`** - [FileEntry.GetPathMetadataForPath](api_file_mgmt_file_entry.md#56-fileentrygetpathmetadataforpath-method) + - GetPathMetadataForPath returns the PathMetadataEntry for a given stored path. +- **`FileEntry.GetPaths`** - [FileEntry.GetPaths](api_file_mgmt_file_entry.md#57-fileentrygetpaths-method) + - GetPaths returns all stored paths for this FileEntry. +- **`FileEntry.GetPrimaryPath`** - [FileEntry.GetPrimaryPath](api_file_mgmt_file_entry.md#53-fileentrygetprimarypath-method) + - GetPrimaryPath returns the primary path in display format (no leading slash). + - The returned string MUST use forward slashes as separators. + - For platform-specific filesystem display, convert manually or use path conversion utilities. +- **`FileEntry.GetSymlinkPaths`** - [FileEntry.GetSymlinkPaths](api_file_mgmt_file_entry.md#52-fileentrygetsymlinkpaths-method) + - GetSymlinkPaths returns all symlink paths associated with this FileEntry. +- **`FileEntry.HasSymlinks`** - [FileEntry.HasSymlinks](api_file_mgmt_file_entry.md#51-fileentryhassymlinks-method) + - HasSymlinks returns true if the FileEntry has any symlink paths. +- **`FileEntry.ResolveAllSymlinks`** - [FileEntry.ResolveAllSymlinks](api_file_mgmt_file_entry.md#54-fileentryresolveallsymlinks-method) + - ResolveAllSymlinks resolves all symlink paths to their target paths. + +### 2.6 FileEntry Transformation Methods - **`FileEntry.CleanupTransformPipeline`** - [FileEntry.CleanupTransformPipeline](api_file_mgmt_file_entry.md#455-fileentrycleanuptransformpipeline-method) - CleanupTransformPipeline cleans up all temporary files in pipeline. - Returns *PackageError on failure. - **`FileEntry.Compress`** - [FileEntry.Compress](api_file_mgmt_file_entry.md#81-fileentrycompress-method) - Compress applies compression to the FileEntry data. -- **`FileEntry.Decrypt`** - [FileEntry.Decrypt](api_file_mgmt_file_entry.md#93-fileentrydecrypt-method) - - Decrypt decrypts data using the file's encryption key. +- **`FileEntry.CopyCurrentToOriginal`** - [FileEntry.CopyCurrentToOriginal](api_file_mgmt_file_entry.md#448-fileentrycopycurrenttooriginal-method) + - CopyCurrentToOriginal saves current source as original (for transformations). - **`FileEntry.Decompress`** - [FileEntry.Decompress](api_file_mgmt_file_entry.md#82-fileentrydecompress-method) - Decompress reverses compression on the FileEntry data. +- **`FileEntry.Decrypt`** - [FileEntry.Decrypt](api_file_mgmt_file_entry.md#93-fileentrydecrypt-method) + - Decrypt decrypts data using the file's encryption key. - **`FileEntry.Encrypt`** - [FileEntry.Encrypt](api_file_mgmt_file_entry.md#92-fileentryencrypt-method) - Encrypt encrypts data using the file's encryption key. -- **`FileEntry.InitializeTransformPipeline`** - [FileEntry.InitializeTransformPipeline](api_file_mgmt_file_entry.md#451-fileentryinitializetransformpipeline-method) - - InitializeTransformPipeline creates a new transformation pipeline. - **`FileEntry.ExecuteTransformStage`** - [FileEntry.ExecuteTransformStage](api_file_mgmt_file_entry.md#453-fileentryexecutetransformstage-method) - ExecuteTransformStage executes a single transformation stage. +- **`FileEntry.InitializeTransformPipeline`** - [FileEntry.InitializeTransformPipeline](api_file_mgmt_file_entry.md#451-fileentryinitializetransformpipeline-method) + - InitializeTransformPipeline creates a new transformation pipeline. - **`FileEntry.ProcessData`** - [FileEntry.ProcessData](api_file_mgmt_file_entry.md#102-fileentryprocessdata-method) - ProcessData processes FileEntry data through the configured pipeline. - **`FileEntry.ResumeTransformation`** - [FileEntry.ResumeTransformation](api_file_mgmt_file_entry.md#454-fileentryresumetransformation-method) @@ -741,8 +766,6 @@ This subsection groups `FileEntry` methods by operational category. - Returns *PackageError on failure. - **`FileEntry.SetCurrentSource`** - [FileEntry.SetCurrentSource](api_file_mgmt_file_entry.md#441-fileentrysetcurrentsource-method) - SetCurrentSource sets the current data source for the FileEntry Returns *PackageError if source is invalid. -- **`FileEntry.SetData`** - [FileEntry.SetData](api_file_mgmt_file_entry.md#414-fileentrysetdata-method) - - SetData sets the in-memory file data. - **`FileEntry.SetEncryptionKey`** - [FileEntry.SetEncryptionKey](api_file_mgmt_file_entry.md#91-fileentrysetencryptionkey-method) - SetEncryptionKey sets the encryption key for the file. - **`FileEntry.SetOriginalSource`** - [FileEntry.SetOriginalSource](api_file_mgmt_file_entry.md#443-fileentrysetoriginalsource-method) @@ -751,45 +774,114 @@ This subsection groups `FileEntry` methods by operational category. - SetOriginalSourceFromPackage creates original source pointing to package file. - **`FileEntry.SetProcessingState`** - [FileEntry.SetProcessingState](api_file_mgmt_file_entry.md#432-fileentrysetprocessingstate-method) - SetProcessingState sets the current processing state. +- **`FileEntry.UnsetEncryptionKey`** - [FileEntry.UnsetEncryptionKey](api_file_mgmt_file_entry.md#94-fileentryunsetencryptionkey-method) + - UnsetEncryptionKey removes the encryption key from the file. +- **`FileEntry.Validate`** - [FileEntry.Validate](api_file_mgmt_file_entry.md#662-fileentryvalidate-method) + - Validate validates the FileEntry and returns an error on invalid state. +- **`FileEntry.ValidateSources`** - [FileEntry.ValidateSources](api_file_mgmt_file_entry.md#456-fileentryvalidatesources-method) + - ValidateSources validates CurrentSource, OriginalSource, and pipeline consistency Returns *PackageError if validation fails. -### 4.2 FileEntry Helper Functions +### 2.7 FileEntry Helper Functions +- **`AddFileEntryTag`** - [Addfileentrytag](api_file_mgmt_file_entry.md#3126-addfileentrytag-function) + - AddFileEntryTag adds a type-safe tag to a FileEntry by key. +- **`AddFileEntryTags`** - [Addfileentrytags](api_file_mgmt_file_entry.md#3124-addfileentrytags-function) + - AddFileEntryTags adds multiple tags to a FileEntry. +- **`GetFileEntryTag`** - [Getfileentrytag](api_file_mgmt_file_entry.md#3123-getfileentrytag-function) + - GetFileEntryTag retrieves a type-safe tag by key from a FileEntry Returns the tag pointer and an error. + - If the tag is not found, returns (nil, nil). + - If an underlying error occurs, returns (nil, error). + - Returns *PackageError on failure If the tag type is unknown, use GetFileEntryTag[any](fe, "key") to retrieve the tag and inspect its Type field. +- **`GetFileEntryTags`** - [Getfileentrytags](api_file_mgmt_file_entry.md#3121-getfileentrytags-function) + - GetFileEntryTags returns all tags as typed tags for a FileEntry Returns *PackageError on failure. +- **`GetFileEntryTagsByType`** - [Getfileentrytagsbytype](api_file_mgmt_file_entry.md#3122-getfileentrytagsbytype-function) + - GetFileEntryTagsByType returns all tags of a specific type for a FileEntry. + - Returns a slice of Tag pointers with the specified type parameter T. + - Only tags matching the type T and corresponding TagValueType are returned. + - Returns *PackageError on failure (corruption, I/O). + - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. + - See api_generics.md for details. +- **`HasFileEntryTag`** - [Hasfileentrytag](api_file_mgmt_file_entry.md#3129-hasfileentrytag-function) + - HasFileEntryTag checks if a tag with the specified key exists on a FileEntry Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. + - See api_generics.md for details. +- **`HasFileEntryTags`** - [Hasfileentrytags](api_file_mgmt_file_entry.md#31210-hasfileentrytags-function) + - HasFileEntryTags checks if the FileEntry has any tags. + - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. + - See api_generics.md for details. - **`NewFileEntry`** - [2.1.1 NewFileEntry Function Signature](api_file_mgmt_file_entry.md#211-newfileentry-function-signature) - NewFileEntry creates a new FileEntry with proper tag synchronization. +- **`NewTag`** - [Newtag](api_file_mgmt_file_entry.md#192-newtag-function) + - NewTag creates a new type-safe tag with the specified key, value, and type. +- **`RemoveFileEntryTag`** - [Removefileentrytag](api_file_mgmt_file_entry.md#3128-removefileentrytag-function) + - RemoveFileEntryTag removes a tag by key from a FileEntry. + - Returns *PackageError on failure. + - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. + - See api_generics.md for details. +- **`SetFileEntryTag`** - [Setfileentrytag](api_file_mgmt_file_entry.md#3127-setfileentrytag-function) + - SetFileEntryTag updates an existing tag with type safety for a FileEntry. + - Returns *PackageError if the tag key does not already exist Only modifies existing tags; does not create new tags Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. + - See api_generics.md for details. +- **`SetFileEntryTags`** - [Setfileentrytags](api_file_mgmt_file_entry.md#3125-setfileentrytags-function) + - SetFileEntryTags updates existing tags from a slice of typed tags for a FileEntry. + - Returns *PackageError if any tag key does not already exist. + - Only modifies tags that already exist; does not create new tags. + - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. + - See api_generics.md for details. +- **`SyncFileEntryTags`** - [Syncfileentrytags](api_file_mgmt_file_entry.md#31211-syncfileentrytags-function) + - SyncFileEntryTags synchronizes tags with the underlying storage for a FileEntry. + - Returns *PackageError on failure. + - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. + - See api_generics.md for details. - **`UnmarshalFileEntry`** - [Unmarshalfileentry](api_file_mgmt_file_entry.md#63-unmarshalfileentry-function) - UnmarshalFileEntry unmarshals a FileEntry from binary data. - Unmarshals the FileEntry with proper tag synchronization. -## 5. Metadata Types +### 2.8 Tag Methods + +- **`Tag.GetValue`** - [Tag.GetValue](api_file_mgmt_file_entry.md#193-tagtgetvalue-method) + - GetValue returns the tag value as any. +- **`Tag.SetValue`** - [Tag.SetValue](api_file_mgmt_file_entry.md#194-tagtsetvalue-method) + - SetValue sets the tag value and updates the type. + +## 3. Package Metadata Types - **`ACLEntry`** - [8.1.6 ACLEntry Structure](api_metadata.md#816-aclentry-structure) - ACLEntry represents an Access Control List entry. -- **`DestPathInput`** - [8.1.12 DestPathInput Interface](api_metadata.md#8112-destpathinput-interface) +- **`CreateOptions`** - [7.6 CreateOptions Structure](api_basic_operations.md#76-createoptions-structure) + - CreateOptions configures package creation behavior (including initial metadata and settings). +- **`DestPathInput`** - [8.1.13 DestPathInput Interface](api_metadata.md#8113-destpathinput-interface) - DestPathInput is the allowed input type set for SetDestPath. - DestPathInput supports. - string: a single destination string. - - map[string]string: a map with keys "DestPath" and/or "DestPathWin" Note: The map form uses string keys for ergonomics in callers. - Keys other than "DestPath" and "DestPathWin" MUST be rejected with ErrTypeValidation. -- **`DestPathOverride`** - [8.1.11 DestPathOverride Structure](api_metadata.md#8111-destpathoverride-structure) + - map[string]string: a map with keys "DestPath" and/or "DestPathWin" Note: The map form uses string keys for ergonomics in callers. Keys other than "DestPath" and "DestPathWin" MUST be rejected with ErrTypeValidation. +- **`DestPathOverride`** - [8.1.12 DestPathOverride Structure](api_metadata.md#8112-destpathoverride-structure) - DestPathOverride specifies destination extraction directory overrides. - A nil field means "no override specified" for that field. -- **`FileMetadataUpdate`** - [1.3.3 FileMetadataUpdate Structure](api_file_mgmt_updates.md#133-filemetadataupdate-structure) - - FileMetadataUpdate contains metadata updates for a FileEntry. -- **`FilePathAssociation`** - [8.1.10 FilePathAssociation Structure](api_metadata.md#8110-filepathassociation-structure) +- **`DestPathSpec`** - [DestPathSpec](api_file_mgmt_extraction.md#1511-destpathspec-struct) + - DestPathSpec configures a destination path override for extraction. +- **`FileIndex`** - [6.1.2 FileIndex Struct](package_file_format.md#612-fileindex-struct) + - FileIndex represents the file index section of a package. +- **`FilePathAssociation`** - [8.1.11 FilePathAssociation Structure](api_metadata.md#8111-filepathassociation-structure) - FilePathAssociation links files to their path metadata. - **`IndexData`** - [5.5.5.3 IndexData Structure](api_metadata.md#5553-indexdata-structure) - IndexData contains index file data structure. +- **`IndexEntry`** - [6.1.1 IndexEntry Struct](package_file_format.md#611-indexentry-struct) + - IndexEntry represents a single file index entry. - **`ManifestData`** - [5.5.5.2 ManifestData Structure](api_metadata.md#5552-manifestdata-structure) - ManifestData contains manifest file data structure. - **`PackageComment`** - [1.2 PackageComment Structure](api_metadata.md#12-packagecomment-structure) - PackageComment represents the optional package comment section. +- **`PackageConfig`** - [9.1 PackageConfig Structure](api_basic_operations.md#91-packageconfig-structure) + - PackageConfig provides package-level configuration for path handling behavior. - **`PackageHeader`** - [7.1.5 PackageHeader Structure](api_metadata.md#715-packageheader-structure) - PackageHeader represents the fixed-size header of a NovusPack (.nvpk) file Size: 112 bytes (fixed). - **`PackageInfo`** - [7.1 PackageInfo Structure](api_metadata.md#71-packageinfo-structure) - PackageInfo contains comprehensive package information and metadata. - **`PathFileSystem`** - [8.1.5 PathFileSystem Structure](api_metadata.md#815-pathfilesystem-structure) - PathFileSystem contains filesystem-specific properties. -- **`PathInfo`** - [8.1.9 PathInfo Structure](api_metadata.md#819-pathinfo-structure) +- **`PathHandling`** - [9.3 PathHandling Type](api_basic_operations.md#93-pathhandling-type) + - PathHandling specifies how to handle multiple paths pointing to the same content. +- **`PathInfo`** - [8.1.10 PathInfo Structure](api_metadata.md#8110-pathinfo-structure) - PathInfo provides runtime path metadata information. - **`PathInheritance`** - [8.1.3 PathInheritance Structure](api_metadata.md#813-pathinheritance-structure) - PathInheritance controls tag inheritance behavior (for directories only). @@ -820,41 +912,24 @@ This subsection groups `FileEntry` methods by operational category. - SignatureInfo contains signature information for a package. - **`SpecialFileInfo`** - [5.5.5.1 SpecialFileInfo Structure](api_metadata.md#5551-specialfileinfo-structure) - SpecialFileInfo contains information about special metadata files in the package. +- **`SymlinkConvertOptions`** - [1.7 SymlinkConvertOptions Struct](api_file_mgmt_updates.md#17-symlinkconvertoptions-struct) + - SymlinkConvertOptions configures path-to-symlink conversion behavior. - **`SymlinkEntry`** - [8.5.8 SymlinkEntry Structure](api_metadata.md#858-symlinkentry-structure) - SymlinkEntry represents a symbolic link with metadata. - **`SymlinkFileSystem`** - [8.5.10 SymlinkFileSystem Structure](api_metadata.md#8510-symlinkfilesystem-structure) - SymlinkFileSystem contains filesystem-specific properties for symlinks. - **`SymlinkMetadata`** - [8.5.9 SymlinkMetadata Structure](api_metadata.md#859-symlinkmetadata-structure) - SymlinkMetadata contains symlink creation and modification information. -- **`Tag`** - [19.1 Tag Struct](api_file_mgmt_file_entry.md#191-tag-struct) - - Tag represents a type-safe tag with a specific value type. -- **`TagValueType`** - [14. TagValueType Type](api_file_mgmt_file_entry.md#14-tagvaluetype-type) - - TagValueType represents the type of a tag value. -- **`TransformStage`** - [2.3 TransformStage Structure](api_file_mgmt_transform_pipelines.md#23-transformstage-structure) - - TransformStage represents a single transformation stage. -### 5.1 Metadata Methods +### 3.1 Package Metadata Type Methods -- **`PackageComment.ReadFrom`** - [PackageComment.ReadFrom](api_metadata.md#133-packagecommentreadfrom-method) - - ReadFrom reads the comment from a reader. -- **`PackageComment.Size`** - [PackageComment.Size](api_metadata.md#131-packagecommentsize-method) - - Size returns the size of the package comment. -- **`PackageComment.Validate`** - [PackageComment.Validate](api_metadata.md#134-packagecommentvalidate-method) - - Validate validates the package comment Returns *PackageError on failure. -- **`PackageComment.WriteTo`** - [PackageComment.WriteTo](api_metadata.md#132-packagecommentwriteto-method) - - WriteTo writes the comment to a writer. -- **`PackageHeader.ToHeader`** - [PackageHeader.ToHeader](api_metadata.md#716-packageheadertoheader-method) - - ToHeader synchronizes PackageHeader fields from the provided PackageInfo. - - This method must only write fields that are represented in the header. - - It must not mutate fields that are computed by the writer pipeline (for example IndexStart, IndexSize, and CRC). - - Returns *PackageError on failure. +- **`PathMetadataEntry.AssociateWithFileEntry`** - [8.1.8.19 PathMetadataEntry.AssociateWithFileEntry Method](api_metadata.md#81819-pathmetadataentryassociatewithfileentry-method) + - PathMetadataEntry.AssociateWithFileEntry FileEntry association methods for PathMetadataEntry AssociateWithFileEntry associates this PathMetadataEntry with a FileEntry The association is established if the PathMetadataEntry.Path.Path matches one of the FileEntry.Paths Returns *PackageError on failure. - **`PackageInfo.FromHeader`** - [PackageInfo.FromHeader](api_metadata.md#714-packageinfofromheader-method) - FromHeader synchronizes PackageInfo fields from the provided PackageHeader. - This method must only copy data that is represented in the header. - It must not compute derived values that require scanning file entries or reading file data. - Returns *PackageError on failure. -- **`PathMetadataEntry.AssociateWithFileEntry`** - [8.1.8.19 PathMetadataEntry.AssociateWithFileEntry Method](api_metadata.md#81819-pathmetadataentryassociatewithfileentry-method) - - PathMetadataEntry.AssociateWithFileEntry FileEntry association methods for PathMetadataEntry AssociateWithFileEntry associates this PathMetadataEntry with a FileEntry The association is established if the PathMetadataEntry.Path.Path matches one of the FileEntry.Paths Returns *PackageError on failure. - **`PathMetadataEntry.GetAncestors`** - [8.1.8.16 PathMetadataEntry.GetAncestors Method](api_metadata.md#81816-pathmetadataentrygetancestors-method) - GetAncestors returns all ancestor path metadata entries up to the root. - **`PathMetadataEntry.GetAssociatedFileEntries`** - [8.1.8.20 PathMetadataEntry.GetAssociatedFileEntries Method](api_metadata.md#81820-pathmetadataentrygetassociatedfileentries-method) @@ -893,28 +968,30 @@ This subsection groups `FileEntry` methods by operational category. - IsRoot returns true if this path metadata entry represents the root path. - **`PathMetadataEntry.IsSymlink`** - [8.1.8.8 PathMetadataEntry.IsSymlink Method](api_metadata.md#8188-pathmetadataentryissymlink-method) - IsSymlink returns true if this path metadata entry represents a symlink. +- **`PackageComment.ReadFrom`** - [PackageComment.ReadFrom](api_metadata.md#133-packagecommentreadfrom-method) + - ReadFrom reads the comment from a reader. - **`PathMetadataEntry.ResolveSymlink`** - [8.1.8.10 PathMetadataEntry.ResolveSymlink Method](api_metadata.md#81810-pathmetadataentryresolvesymlink-method) - ResolveSymlink resolves the symlink to its final target path. - **`PathMetadataEntry.SetParentPath`** - [8.1.8.11 PathMetadataEntry.SetParentPath Method](api_metadata.md#81811-pathmetadataentrysetparentpath-method) - PathMetadataEntry.SetParentPath Parent path management methods for PathMetadataEntry. - **`PathMetadataEntry.SetPath`** - [8.1.8.1 PathMetadataEntry.SetPath Method](api_metadata.md#8181-pathmetadataentrysetpath-method) - PathMetadataEntry.SetPath Path management methods for PathMetadataEntry. -- **`Tag.GetValue`** - [Tag.GetValue](api_file_mgmt_file_entry.md#193-tagtgetvalue-method) - - GetValue returns the type-safe value of the tag. -- **`Tag.SetValue`** - [Tag.SetValue](api_file_mgmt_file_entry.md#194-tagtsetvalue-method) - - SetValue sets the type-safe value of the tag. +- **`PackageComment.Size`** - [PackageComment.Size](api_metadata.md#131-packagecommentsize-method) + - Size returns the size of the package comment. +- **`PackageHeader.ToHeader`** - [PackageHeader.ToHeader](api_metadata.md#716-packageheadertoheader-method) + - ToHeader synchronizes PackageHeader fields from the provided PackageInfo. + - This method must only write fields that are represented in the header. + - It must not mutate fields that are computed by the writer pipeline (for example IndexStart, IndexSize, and CRC). + - Returns *PackageError on failure. +- **`PackageComment.Validate`** - [PackageComment.Validate](api_metadata.md#134-packagecommentvalidate-method) + - Validate validates the package comment Returns *PackageError on failure. +- **`PathMetadataEntry.Validate`** - [8.1.9.1 PathMetadataEntry.Validate Method](api_metadata.md#8191-pathmetadataentryvalidate-method) + - Validate validates the PathMetadataEntry state and returns an error on failure. +- **`PackageComment.WriteTo`** - [PackageComment.WriteTo](api_metadata.md#132-packagecommentwriteto-method) + - WriteTo writes the comment to a writer. -### 5.2 Metadata Helper Functions +### 3.2 Package Metadata Helper Functions -- **`AddFileEntryTag`** - [Addfileentrytag](api_file_mgmt_file_entry.md#3126-addfileentrytag-function) - - AddFileEntryTag adds a new tag with type safety to a FileEntry. - - Returns *PackageError if a tag with the same key already exists Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. -- **`AddFileEntryTags`** - [Addfileentrytags](api_file_mgmt_file_entry.md#3124-addfileentrytags-function) - - AddFileEntryTags adds multiple new tags with type safety to a FileEntry. - - Returns *PackageError if any tag with the same key already exists. - - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. - **`AddPathMetaTag`** - [Addpathmetatag](api_metadata.md#8176-addpathmetatag-function) - AddPathMetaTag adds a new tag with type safety to a PathMetadataEntry Returns *PackageError if a tag with the same key already exists. - **`AddPathMetaTags`** - [Addpathmetatags](api_metadata.md#8173-addpathmetatags-function) @@ -927,26 +1004,6 @@ This subsection groups `FileEntry` methods by operational category. - CheckSignatureCommentLength validates signature comment length Returns *PackageError on failure. - **`DetectInjectionPatterns`** - [Detectinjectionpatterns](api_metadata.md#145-detectinjectionpatterns-function) - DetectInjectionPatterns scans comment for malicious patterns. -- **`GetFileEntryTag`** - [Getfileentrytag](api_file_mgmt_file_entry.md#3123-getfileentrytag-function) - - GetFileEntryTag retrieves a type-safe tag by key from a FileEntry. - - Returns the tag pointer and an error. - - If the tag is not found, returns (nil, nil). - - If an underlying error occurs (corruption, I/O), returns (nil, error). - - Returns *PackageError on failure. - - If the tag type is unknown, use GetFileEntryTag[any](fe, "key") to retrieve the tag and inspect its Type field. -- **`GetFileEntryTags`** - [Getfileentrytags](api_file_mgmt_file_entry.md#3121-getfileentrytags-function) - - GetFileEntryTags returns all tags as typed tags for a FileEntry. - - Returns a slice of Tag pointers, where each tag maintains its type information. - - Returns *PackageError on failure (corruption, I/O). - - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. -- **`GetFileEntryTagsByType`** - [Getfileentrytagsbytype](api_file_mgmt_file_entry.md#3122-getfileentrytagsbytype-function) - - GetFileEntryTagsByType returns all tags of a specific type for a FileEntry. - - Returns a slice of Tag pointers with the specified type parameter T. - - Only tags matching the type T and corresponding TagValueType are returned. - - Returns *PackageError on failure (corruption, I/O). - - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. - **`GetPathMetaTag`** - [Getpathmetatag](api_metadata.md#8175-getpathmetatag-function) - GetPathMetaTag retrieves a type-safe tag by key from a PathMetadataEntry Returns the tag pointer and an error. - If the tag is not found, returns (nil, nil). @@ -956,22 +1013,16 @@ This subsection groups `FileEntry` methods by operational category. - GetPathMetaTags returns all tags as typed tags for a PathMetadataEntry Returns *PackageError on failure. - **`GetPathMetaTagsByType`** - [Getpathmetatagsbytype](api_metadata.md#8172-getpathmetatagsbytype-function) - GetPathMetaTagsByType returns all tags of a specific type for a PathMetadataEntry Returns a slice of Tag pointers with the specified type parameter T Only tags matching the type T and corresponding TagValueType are returned Returns *PackageError on failure. -- **`HasFileEntryTag`** - [Hasfileentrytag](api_file_mgmt_file_entry.md#3129-hasfileentrytag-function) - - HasFileEntryTag checks if a tag with the specified key exists on a FileEntry Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. -- **`HasFileEntryTags`** - [Hasfileentrytags](api_file_mgmt_file_entry.md#31210-hasfileentrytags-function) - - HasFileEntryTags checks if the FileEntry has any tags. - - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. - **`HasPathMetaTag`** - [Haspathmetatag](api_metadata.md#8179-haspathmetatag-function) - HasPathMetaTag checks if a tag with the specified key exists on a PathMetadataEntry. -- **`NewTag`** - [Newtag](api_file_mgmt_file_entry.md#192-newtag-function) - - NewTag creates a new type-safe tag with the specified key, value, and type. -- **`RemoveFileEntryTag`** - [Removefileentrytag](api_file_mgmt_file_entry.md#3128-removefileentrytag-function) - - RemoveFileEntryTag removes a tag by key from a FileEntry. - - Returns *PackageError on failure. - - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. +- **`NewFileIndex`** - [NewFileIndex](package_file_format.md#613-newfileindex-function) + - NewFileIndex creates and returns a new FileIndex with zero values. +- **`NewPackageComment`** - [1.3.5 NewPackageComment Function](api_metadata.md#135-newpackagecomment-function) + - NewPackageComment creates and returns a new PackageComment with zero values. +- **`NewPackageHeader`** - [2.8.2 NewPackageHeader Function](package_file_format.md#282-newpackageheader-function) + - NewPackageHeader creates and returns a new PackageHeader with default values. +- **`NewPackageInfo`** - [7.1.2 NewPackageInfo Function](api_metadata.md#712-newpackageinfo-function) + - NewPackageInfo creates a new PackageInfo with default values. - **`RemovePathMetaTag`** - [Removepathmetatag](api_metadata.md#8178-removepathmetatag-function) - RemovePathMetaTag removes a tag by key from a PathMetadataEntry Returns *PackageError on failure. - **`SanitizeComment`** - [Sanitizecomment](api_metadata.md#142-sanitizecomment-function) @@ -985,25 +1036,10 @@ This subsection groups `FileEntry` methods by operational category. - If dest is a string, it MUST be parsed to determine which destination field to set. - If the string is a Windows-only absolute path (drive letter like "C:\\" or "C:/", or UNC path like "\\\\server\\share"), it MUST be stored as DestPathWin. - Otherwise, it MUST be stored as DestPath. -- **`SetFileEntryTag`** - [Setfileentrytag](api_file_mgmt_file_entry.md#3127-setfileentrytag-function) - - SetFileEntryTag updates an existing tag with type safety for a FileEntry. - - Returns *PackageError if the tag key does not already exist Only modifies existing tags; does not create new tags Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. -- **`SetFileEntryTags`** - [Setfileentrytags](api_file_mgmt_file_entry.md#3125-setfileentrytags-function) - - SetFileEntryTags updates existing tags from a slice of typed tags for a FileEntry. - - Returns *PackageError if any tag key does not already exist. - - Only modifies tags that already exist; does not create new tags. - - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. - **`SetPathMetaTag`** - [Setpathmetatag](api_metadata.md#8177-setpathmetatag-function) - SetPathMetaTag updates an existing tag with type safety for a PathMetadataEntry Returns *PackageError if the tag key does not already exist Only modifies existing tags; does not create new tags. - **`SetPathMetaTags`** - [Setpathmetatags](api_metadata.md#8174-setpathmetatags-function) - SetPathMetaTags updates existing tags from a slice of typed tags for a PathMetadataEntry Returns *PackageError if any tag key does not already exist Only modifies tags that already exist; does not create new tags. -- **`SyncFileEntryTags`** - [Syncfileentrytags](api_file_mgmt_file_entry.md#31211-syncfileentrytags-function) - - SyncFileEntryTags synchronizes tags with the underlying storage for a FileEntry. - - Returns *PackageError on failure. - - Note: This is a standalone function rather than a method due to Go's limitation of not supporting generic methods on non-generic types. - - See api_generics.md for details. - **`ValidateComment`** - [Validatecomment](api_metadata.md#141-validatecomment-function) - ValidateComment validates comment content for security issues Returns *PackageError on failure. - **`ValidateCommentEncoding`** - [Validatecommentencoding](api_metadata.md#143-validatecommentencoding-function) @@ -1011,7 +1047,7 @@ This subsection groups `FileEntry` methods by operational category. - **`ValidateSignatureComment`** - [Validatesignaturecomment](api_metadata.md#151-validatesignaturecomment-function) - ValidateSignatureComment validates signature comment for security issues Returns *PackageError on failure. -## 6. Compression Types +## 4. Compression Types - **`AdvancedCompressionStrategy`** - [2.1.3 AdvancedCompressionStrategy Interface](api_package_compression.md#213-advancedcompressionstrategy-interface) - AdvancedCompressionStrategy for compression with additional validation and metrics. @@ -1053,56 +1089,56 @@ This subsection groups `FileEntry` methods by operational category. - LZ4Strategy LZ4 compression strategy with generic support. - **`LZMAStrategy`** - [Lzmastrategy](api_package_compression.md#223-lzmastrategy-structure) - LZMAStrategy LZMA compression strategy with generic support. -- **`MemoryStrategy`** - [MemoryStrategy](api_package_compression.md#215-memorystrategy-type) - - MemoryStrategy defines the memory management approach for compression operations. - **`MemoryErrorContext`** - [Memoryerrorcontext](api_package_compression.md#14314-memoryerrorcontext-structure) - MemoryErrorContext provides error context for memory-related compression errors. +- **`MemoryStrategy`** - [MemoryStrategy](api_package_compression.md#215-memorystrategy-type) + - MemoryStrategy defines the memory management approach for compression operations. - **`PackageCompressionInfo`** - [1.3 PackageCompressionInfo Struct](api_package_compression.md#13-packagecompressioninfo-struct) - PackageCompressionInfo contains package compression details. - **`StreamConfig`** - [Streamconfig](api_package_compression.md#214-streamconfig-structure) - StreamConfig handles streaming compression for files of any size. - **`UnsupportedCompressionErrorContext`** - [4.3.1.3 UnsupportedCompressionErrorContext Structure](api_package_compression.md#14313-unsupportedcompressionerrorcontext-structure) - UnsupportedCompressionErrorContext provides error context for unsupported compression type errors. -- **`ZstandardStrategy`** - [Zstandardstrategy](api_package_compression.md#221-zstandardstrategy-structure) - - ZstandardStrategy Zstandard compression strategy with generic support. - -### 6.1 Compression Methods - -- **`CompressionConfigBuilder.Build`** - [CompressionConfigBuilder.Build](api_package_compression.md#9127-compressionconfigbuilderbuild-method) - - Build constructs and returns the final compression configuration. -- **`CompressionConfigBuilder.WithCompressionLevel`** - [CompressionConfigBuilder.WithCompressionLevel](api_package_compression.md#9124-compressionconfigbuilderwithcompressionlevel-method) - - WithCompressionLevel sets the compression level for the compression configuration builder. -- **`CompressionConfigBuilder.WithCompressionType`** - [CompressionConfigBuilder.WithCompressionType](api_package_compression.md#9123-compressionconfigbuilderwithcompressiontype-method) - - WithCompressionType sets the compression type for the configuration. -- **`CompressionConfigBuilder.WithMemoryStrategy`** - [CompressionConfigBuilder.WithMemoryStrategy](api_package_compression.md#9126-compressionconfigbuilderwithmemorystrategy-method) - - WithMemoryStrategy sets the memory strategy for the configuration. -- **`CompressionConfigBuilder.WithSolidCompression`** - [CompressionConfigBuilder.WithSolidCompression](api_package_compression.md#9125-compressionconfigbuilderwithsolidcompression-method) - - WithSolidCompression enables or disables solid compression for the configuration. +- **`ZstandardStrategy`** - [Zstandardstrategy](api_package_compression.md#221-zstandardstrategy-structure) + - ZstandardStrategy Zstandard compression strategy with generic support. + +### 4.1 Compression Methods + - **`CompressionResourcePool.AcquireCompressionResource`** - [CompressionResourcePool.AcquireCompressionResource](api_package_compression.md#8412-compressionresourcepoolacquirecompressionresource-method) - CompressionResourcePool.AcquireCompressionResource Compression-specific resource management methods. -- **`CompressionResourcePool.GetCompressionResourceStats`** - [CompressionResourcePool.GetCompressionResourceStats](api_package_compression.md#8414-compressionresourcepoolgetcompressionresourcestats-method) - - GetCompressionResourceStats returns statistics about compression resource usage. -- **`CompressionResourcePool.ReleaseCompressionResource`** - [CompressionResourcePool.ReleaseCompressionResource](api_package_compression.md#8413-compressionresourcepoolreleasecompressionresource-method) - - CompressionResourcePool.ReleaseCompressionResource Returns *PackageError on failure. - **`CompressionValidator.AddCompressionRule`** - [CompressionValidator.AddCompressionRule](api_package_compression.md#9212-compressionvalidatoraddcompressionrule-method) - AddCompressionRule adds a compression validation rule to the validator. -- **`CompressionValidator.ValidateCompressionData`** - [CompressionValidator.ValidateCompressionData](api_package_compression.md#9213-compressionvalidatorvalidatecompressiondata-method) - - CompressionValidator.ValidateCompressionData Returns *PackageError on failure. -- **`CompressionValidator.ValidateDecompressionData`** - [CompressionValidator.ValidateDecompressionData](api_package_compression.md#9214-compressionvalidatorvalidatedecompressiondata-method) - - CompressionValidator.ValidateDecompressionData Returns *PackageError on failure. +- **`CompressionConfigBuilder.Build`** - [CompressionConfigBuilder.Build](api_package_compression.md#9127-compressionconfigbuilderbuild-method) + - Build constructs and returns the final compression configuration. - **`CompressionWorkerPool.CompressConcurrently`** - [CompressionWorkerPool.CompressConcurrently](api_package_compression.md#822-compressionworkerpooltcompressconcurrently-method) - CompressionWorkerPool.CompressConcurrently Compression-specific methods. - **`CompressionWorkerPool.DecompressConcurrently`** - [CompressionWorkerPool.DecompressConcurrently](api_package_compression.md#823-compressionworkerpooltdecompressconcurrently-method) - DecompressConcurrently decompresses multiple data items concurrently using a worker pool. +- **`CompressionResourcePool.GetCompressionResourceStats`** - [CompressionResourcePool.GetCompressionResourceStats](api_package_compression.md#8414-compressionresourcepoolgetcompressionresourcestats-method) + - GetCompressionResourceStats returns statistics about compression resource usage. - **`CompressionWorkerPool.GetCompressionStats`** - [CompressionWorkerPool.GetCompressionStats](api_package_compression.md#824-compressionworkerpooltgetcompressionstats-method) - GetCompressionStats returns statistics about compression operations performed by the worker pool. +- **`CompressionResourcePool.ReleaseCompressionResource`** - [CompressionResourcePool.ReleaseCompressionResource](api_package_compression.md#8413-compressionresourcepoolreleasecompressionresource-method) + - CompressionResourcePool.ReleaseCompressionResource Returns *PackageError on failure. +- **`CompressionValidator.ValidateCompressionData`** - [CompressionValidator.ValidateCompressionData](api_package_compression.md#9213-compressionvalidatorvalidatecompressiondata-method) + - CompressionValidator.ValidateCompressionData Returns *PackageError on failure. +- **`CompressionValidator.ValidateDecompressionData`** - [CompressionValidator.ValidateDecompressionData](api_package_compression.md#9214-compressionvalidatorvalidatedecompressiondata-method) + - CompressionValidator.ValidateDecompressionData Returns *PackageError on failure. +- **`CompressionConfigBuilder.WithCompressionLevel`** - [CompressionConfigBuilder.WithCompressionLevel](api_package_compression.md#9124-compressionconfigbuilderwithcompressionlevel-method) + - WithCompressionLevel sets the compression level for the compression configuration builder. +- **`CompressionConfigBuilder.WithCompressionType`** - [CompressionConfigBuilder.WithCompressionType](api_package_compression.md#9123-compressionconfigbuilderwithcompressiontype-method) + - WithCompressionType sets the compression type for the configuration. +- **`CompressionConfigBuilder.WithMemoryStrategy`** - [CompressionConfigBuilder.WithMemoryStrategy](api_package_compression.md#9126-compressionconfigbuilderwithmemorystrategy-method) + - WithMemoryStrategy sets the memory strategy for the configuration. +- **`CompressionConfigBuilder.WithSolidCompression`** - [CompressionConfigBuilder.WithSolidCompression](api_package_compression.md#9125-compressionconfigbuilderwithsolidcompression-method) + - WithSolidCompression enables or disables solid compression for the configuration. -### 6.2 Compression Helper Functions +### 4.2 Compression Helper Functions - **`NewCompressionConfigBuilder`** - [9.1.2.2 NewCompressionConfigBuilder Function](api_package_compression.md#9122-newcompressionconfigbuilder-function) - NewCompressionConfigBuilder creates a new compression configuration builder. -## 7. Encryption and Security Types +## 5. Encryption and Security Types - **`AES256GCMFileHandler`** - [4.5.2.1 AES256GCMFileHandler Structure](api_security.md#4521-aes256gcmfilehandler-structure) - AES256GCMFileHandler Built-in file encryption handlers. @@ -1139,46 +1175,46 @@ This subsection groups `FileEntry` methods by operational category. - SecurityErrorContext provides typed context for security-related errors. - This context structure is used with structured errors to provide additional diagnostic information for security operations. -### 7.1 Encryption and Security Methods +### 5.1 Encryption and Security Methods +- **`EncryptionValidator.AddEncryptionRule`** - [EncryptionValidator.AddEncryptionRule](api_security.md#443-encryptionvalidatortaddencryptionrule-method) + - AddEncryptionRule adds an encryption validation rule to the validator. - **`EncryptionConfigBuilder.Build`** - [EncryptionConfigBuilder.Build](api_security.md#4327-encryptionconfigbuildertbuild-method) - Build constructs and returns the final encryption configuration. -- **`EncryptionConfigBuilder.WithAuthenticationTag`** - [EncryptionConfigBuilder.WithAuthenticationTag](api_security.md#4326-encryptionconfigbuildertwithauthenticationtag-method) - - WithAuthenticationTag enables or disables authentication tag generation for the configuration. -- **`EncryptionConfigBuilder.WithEncryptionType`** - [EncryptionConfigBuilder.WithEncryptionType](api_security.md#4323-encryptionconfigbuildertwithencryptiontype-method) - - WithEncryptionType sets the encryption type for the configuration. -- **`EncryptionConfigBuilder.WithKeySize`** - [EncryptionConfigBuilder.WithKeySize](api_security.md#4324-encryptionconfigbuildertwithkeysize-method) - - WithKeySize sets the encryption key size for the encryption configuration builder. -- **`EncryptionConfigBuilder.WithRandomIV`** - [EncryptionConfigBuilder.WithRandomIV](api_security.md#4325-encryptionconfigbuildertwithrandomiv-method) - - WithRandomIV enables or disables random IV generation for the configuration. +- **`MLKEMKey.Clear`** - [MLKEMKey.Clear](api_security.md#533-mlkemkeyclear-method) + - Clear clears sensitive key data from memory. +- **`MLKEMKey.Decrypt`** - [MLKEMKey.Decrypt](api_security.md#522-mlkemkeydecrypt-method) + - Decrypt decrypts ciphertext using ML-KEM key. +- **`MLKEMKey.Encrypt`** - [MLKEMKey.Encrypt](api_security.md#521-mlkemkeyencrypt-method) + - Encrypt encrypts plaintext using ML-KEM key. - **`EncryptionKey.GetKey`** - [EncryptionKey.GetKey](api_security.md#4133-encryptionkeytgetkey-method) - GetKey returns the encryption key material. +- **`MLKEMKey.GetLevel`** - [MLKEMKey.GetLevel](api_security.md#532-mlkemkeygetlevel-method) + - GetLevel returns the security level of the key. +- **`MLKEMKey.GetPublicKey`** - [MLKEMKey.GetPublicKey](api_security.md#531-mlkemkeygetpublickey-method) + - GetPublicKey returns the public key data. - **`EncryptionKey.IsExpired`** - [EncryptionKey.IsExpired](api_security.md#4136-encryptionkeytisexpired-method) - IsExpired returns true if the encryption key has expired. - **`EncryptionKey.IsValid`** - [EncryptionKey.IsValid](api_security.md#4135-encryptionkeytisvalid-method) - IsValid returns true if the encryption key is valid. - **`EncryptionKey.SetKey`** - [EncryptionKey.SetKey](api_security.md#4134-encryptionkeytsetkey-method) - SetKey sets the encryption key material. -- **`EncryptionValidator.AddEncryptionRule`** - [EncryptionValidator.AddEncryptionRule](api_security.md#443-encryptionvalidatortaddencryptionrule-method) - - AddEncryptionRule adds an encryption validation rule to the validator. - **`EncryptionValidator.ValidateDecryptionData`** - [EncryptionValidator.ValidateDecryptionData](api_security.md#445-encryptionvalidatortvalidatedecryptiondata-method) - EncryptionValidator.ValidateDecryptionData Returns *PackageError on failure. - **`EncryptionValidator.ValidateEncryptionData`** - [EncryptionValidator.ValidateEncryptionData](api_security.md#444-encryptionvalidatortvalidateencryptiondata-method) - EncryptionValidator.ValidateEncryptionData Returns *PackageError on failure. - **`EncryptionValidator.ValidateEncryptionKey`** - [EncryptionValidator.ValidateEncryptionKey](api_security.md#446-encryptionvalidatortvalidateencryptionkey-method) - EncryptionValidator.ValidateEncryptionKey Returns *PackageError on failure. -- **`MLKEMKey.Clear`** - [MLKEMKey.Clear](api_security.md#533-mlkemkeyclear-method) - - Clear clears sensitive key data from memory. -- **`MLKEMKey.Decrypt`** - [MLKEMKey.Decrypt](api_security.md#522-mlkemkeydecrypt-method) - - Decrypt decrypts ciphertext using ML-KEM key. -- **`MLKEMKey.Encrypt`** - [MLKEMKey.Encrypt](api_security.md#521-mlkemkeyencrypt-method) - - Encrypt encrypts plaintext using ML-KEM key. -- **`MLKEMKey.GetLevel`** - [MLKEMKey.GetLevel](api_security.md#532-mlkemkeygetlevel-method) - - GetLevel returns the security level of the key. -- **`MLKEMKey.GetPublicKey`** - [MLKEMKey.GetPublicKey](api_security.md#531-mlkemkeygetpublickey-method) - - GetPublicKey returns the public key data. +- **`EncryptionConfigBuilder.WithAuthenticationTag`** - [EncryptionConfigBuilder.WithAuthenticationTag](api_security.md#4326-encryptionconfigbuildertwithauthenticationtag-method) + - WithAuthenticationTag enables or disables authentication tag generation for the configuration. +- **`EncryptionConfigBuilder.WithEncryptionType`** - [EncryptionConfigBuilder.WithEncryptionType](api_security.md#4323-encryptionconfigbuildertwithencryptiontype-method) + - WithEncryptionType sets the encryption type for the configuration. +- **`EncryptionConfigBuilder.WithKeySize`** - [EncryptionConfigBuilder.WithKeySize](api_security.md#4324-encryptionconfigbuildertwithkeysize-method) + - WithKeySize sets the encryption key size for the encryption configuration builder. +- **`EncryptionConfigBuilder.WithRandomIV`** - [EncryptionConfigBuilder.WithRandomIV](api_security.md#4325-encryptionconfigbuildertwithrandomiv-method) + - WithRandomIV enables or disables random IV generation for the configuration. -### 7.2 Encryption and Security Helper Functions +### 5.2 Encryption and Security Helper Functions - **`GetEncryptionTypeName`** - [3.2.2 GetEncryptionTypeName Function](api_security.md#322-getencryptiontypename-function) - GetEncryptionTypeName returns the human-readable name of the encryption type. @@ -1189,7 +1225,7 @@ This subsection groups `FileEntry` methods by operational category. - **`NewEncryptionKey`** - [4.1.3.2 NewEncryptionKey Function](api_security.md#4132-newencryptionkey-function) - NewEncryptionKey creates a new encryption key with the specified type, ID, and key material. -## 8. Signature Types +## 6. Signature Types - **`ByteSignatureStrategy`** - [Bytesignaturestrategy](api_signatures.md#412-bytesignaturestrategy-interface) - ByteSignatureStrategy is the concrete implementation for []byte data. @@ -1217,18 +1253,34 @@ This subsection groups `FileEntry` methods by operational category. - **`ValidationErrorContext`** - [Validationerrorcontext](api_signatures.md#534-validationerrorcontext-structure) - ValidationErrorContext provides error context for signature validation errors. -### 8.1 Signature Methods +### 6.1 Signature Methods +- **`SignatureValidator.AddSignatureRule`** - [SignatureValidator.AddSignatureRule](api_signatures.md#443-signaturevalidatortaddsignaturerule-method) + - AddSignatureRule adds a signature validation rule to the validator. +- **`SignatureConfigBuilder.Build`** - [SignatureConfigBuilder.Build](api_signatures.md#4327-signatureconfigbuildertbuild-method) + - Build constructs and returns the final signature configuration. - **`Signature.GetData`** - [Signature.GetData](api_signatures.md#4143-signaturetgetdata-method) - GetData returns the signature data. +- **`SigningKey.GetKey`** - [SigningKey.GetKey](api_signatures.md#4133-signingkeytgetkey-method) + - GetKey returns the signing key material. - **`Signature.GetSignatureType`** - [Signature.GetSignatureType](api_signatures.md#4146-signaturetgetsignaturetype-method) - GetSignatureType returns the type of the signature. +- **`SigningKey.IsExpired`** - [SigningKey.IsExpired](api_signatures.md#4136-signingkeytisexpired-method) + - IsExpired returns true if the signing key has expired. - **`Signature.IsValid`** - [Signature.IsValid](api_signatures.md#4145-signaturetisvalid-method) - IsValid returns true if the signature is valid. +- **`SigningKey.IsValid`** - [SigningKey.IsValid](api_signatures.md#4135-signingkeytisvalid-method) + - IsValid returns true if the signing key is valid. - **`Signature.SetData`** - [Signature.SetData](api_signatures.md#4144-signaturetsetdata-method) - SetData sets the signature data. -- **`SignatureConfigBuilder.Build`** - [SignatureConfigBuilder.Build](api_signatures.md#4327-signatureconfigbuildertbuild-method) - - Build constructs and returns the final signature configuration. +- **`SigningKey.SetKey`** - [SigningKey.SetKey](api_signatures.md#4134-signingkeytsetkey-method) + - SetKey sets the signing key material. +- **`SignatureValidator.ValidateSignatureData`** - [SignatureValidator.ValidateSignatureData](api_signatures.md#444-signaturevalidatortvalidatesignaturedata-method) + - SignatureValidator.ValidateSignatureData Returns *PackageError on failure. +- **`SignatureValidator.ValidateSignatureFormat`** - [SignatureValidator.ValidateSignatureFormat](api_signatures.md#446-signaturevalidatortvalidatesignatureformat-method) + - SignatureValidator.ValidateSignatureFormat Returns *PackageError on failure. +- **`SignatureValidator.ValidateSignatureKey`** - [SignatureValidator.ValidateSignatureKey](api_signatures.md#445-signaturevalidatortvalidatesignaturekey-method) + - SignatureValidator.ValidateSignatureKey Returns *PackageError on failure. - **`SignatureConfigBuilder.WithKeySize`** - [SignatureConfigBuilder.WithKeySize](api_signatures.md#4324-signatureconfigbuildertwithkeysize-method) - WithKeySize sets the key size for the configuration. - **`SignatureConfigBuilder.WithMetadata`** - [SignatureConfigBuilder.WithMetadata](api_signatures.md#4326-signatureconfigbuildertwithmetadata-method) @@ -1237,24 +1289,8 @@ This subsection groups `FileEntry` methods by operational category. - WithSignatureType sets the signature type for the configuration. - **`SignatureConfigBuilder.WithTimestamp`** - [SignatureConfigBuilder.WithTimestamp](api_signatures.md#4325-signatureconfigbuildertwithtimestamp-method) - WithTimestamp enables or disables timestamp inclusion for the configuration. -- **`SignatureValidator.AddSignatureRule`** - [SignatureValidator.AddSignatureRule](api_signatures.md#443-signaturevalidatortaddsignaturerule-method) - - AddSignatureRule adds a signature validation rule to the validator. -- **`SignatureValidator.ValidateSignatureData`** - [SignatureValidator.ValidateSignatureData](api_signatures.md#444-signaturevalidatortvalidatesignaturedata-method) - - SignatureValidator.ValidateSignatureData Returns *PackageError on failure. -- **`SignatureValidator.ValidateSignatureFormat`** - [SignatureValidator.ValidateSignatureFormat](api_signatures.md#446-signaturevalidatortvalidatesignatureformat-method) - - SignatureValidator.ValidateSignatureFormat Returns *PackageError on failure. -- **`SignatureValidator.ValidateSignatureKey`** - [SignatureValidator.ValidateSignatureKey](api_signatures.md#445-signaturevalidatortvalidatesignaturekey-method) - - SignatureValidator.ValidateSignatureKey Returns *PackageError on failure. -- **`SigningKey.GetKey`** - [SigningKey.GetKey](api_signatures.md#4133-signingkeytgetkey-method) - - GetKey returns the signing key material. -- **`SigningKey.IsExpired`** - [SigningKey.IsExpired](api_signatures.md#4136-signingkeytisexpired-method) - - IsExpired returns true if the signing key has expired. -- **`SigningKey.IsValid`** - [SigningKey.IsValid](api_signatures.md#4135-signingkeytisvalid-method) - - IsValid returns true if the signing key is valid. -- **`SigningKey.SetKey`** - [SigningKey.SetKey](api_signatures.md#4134-signingkeytsetkey-method) - - SetKey sets the signing key material. -### 8.2 Signature Helper Functions +### 6.2 Signature Helper Functions - **`NewSignature`** - [4.1.4.2 NewSignature Function](api_signatures.md#4142-newsignature-function) - Note: Current implementation is simplified for v1 (signatures deferred to v2) Future v2 implementation: func NewSignature[T any](sigType SignatureType, data []byte) *Signature[T]. @@ -1263,7 +1299,7 @@ This subsection groups `FileEntry` methods by operational category. - **`NewSigningKey`** - [Newsigningkey](api_signatures.md#4132-newsigningkey-function) - NewSigningKey creates a new signing key with the specified type, ID, and key material. -## 9. Streaming and Buffer Types +## 7. Streaming and Buffer Types - **`BufferConfig`** - [2.2.2 BufferConfig Struct](api_streaming.md#222-bufferconfig-struct) - BufferConfig configures buffer pool behavior and limits. @@ -1288,30 +1324,30 @@ This subsection groups `FileEntry` methods by operational category. - **`StreamingWorkerPool`** - [3.2.1.1 StreamingWorkerPool Structure](api_streaming.md#3211-streamingworkerpool-structure) - StreamingWorkerPool manages concurrent streaming workers. -### 9.1 Streaming and Buffer Methods +### 7.1 Streaming and Buffer Methods -- **`BufferPool.Get`** - [BufferPool.Get](api_streaming.md#2213-bufferpooltget-method) - - Get retrieves a buffer of the specified size from the pool. -- **`BufferPool.GetStats`** - [BufferPool.GetStats](api_streaming.md#2314-bufferpooltgetstats-method) - - GetStats returns statistics about buffer pool usage. -- **`BufferPool.Put`** - [BufferPool.Put](api_streaming.md#2214-bufferpooltput-method) - - Put returns a buffer to the pool for reuse. -- **`BufferPool.SetMaxTotalSize`** - [BufferPool.SetMaxTotalSize](api_streaming.md#2322-bufferpooltsetmaxtotalsize-method) - - SetMaxTotalSize sets the maximum total size for all buffers in the pool. -- **`BufferPool.TotalSize`** - [BufferPool.TotalSize](api_streaming.md#2321-bufferpoolttotalsize-method) - - BufferPool.TotalSize Additional BufferPool methods. +- **`StreamingConfigBuilder.Build`** - [StreamingConfigBuilder.Build](api_streaming.md#4218-streamingconfigbuilderbuild-method) + - Build constructs and returns the final streaming configuration. - **`FileStream.Close`** - [FileStream.Close](api_streaming.md#1323-filestreamclose-method) - FileStream.Close Returns *PackageError on failure. - **`FileStream.EstimatedTimeRemaining`** - [FileStream.EstimatedTimeRemaining](api_streaming.md#1336-filestreamestimatedtimeremaining-method) - EstimatedTimeRemaining returns an estimate of the time remaining to complete the stream read. +- **`BufferPool.Get`** - [BufferPool.Get](api_streaming.md#2213-bufferpooltget-method) + - Get retrieves a buffer of the specified size from the pool. +- **`BufferPool.GetStats`** - [BufferPool.GetStats](api_streaming.md#2314-bufferpooltgetstats-method) + - GetStats returns statistics about buffer pool usage. - **`FileStream.GetStats`** - [FileStream.GetStats](api_streaming.md#1331-filestreamgetstats-method) - GetStats returns statistics about the stream's read operations. +- **`StreamingWorkerPool.GetStreamingStats`** - [StreamingWorkerPool.GetStreamingStats](api_streaming.md#333-streamingworkerpoolgetstreamingstats-method) + - GetStreamingStats returns current streaming worker pool statistics. - **`FileStream.IsClosed`** - [FileStream.IsClosed](api_streaming.md#1334-filestreamisclosed-method) - IsClosed returns true if the stream has been closed. - **`FileStream.Position`** - [FileStream.Position](api_streaming.md#1333-filestreamposition-method) - Position returns the current read position in the stream. - **`FileStream.Progress`** - [FileStream.Progress](api_streaming.md#1335-filestreamprogress-method) - Progress returns progress information about the stream read operation. +- **`BufferPool.Put`** - [BufferPool.Put](api_streaming.md#2214-bufferpooltput-method) + - Put returns a buffer to the pool for reuse. - **`FileStream.Read`** - [FileStream.Read](api_streaming.md#1341-filestreamread-method) - Read reads data from the stream into the provided buffer. - **`FileStream.ReadAt`** - [FileStream.ReadAt](api_streaming.md#1342-filestreamreadat-method) @@ -1320,10 +1356,18 @@ This subsection groups `FileEntry` methods by operational category. - FileStream.ReadChunk Returns *PackageError on failure. - **`FileStream.Seek`** - [FileStream.Seek](api_streaming.md#1322-filestreamseek-method) - FileStream.Seek Returns *PackageError on failure. +- **`BufferPool.SetMaxTotalSize`** - [BufferPool.SetMaxTotalSize](api_streaming.md#2322-bufferpooltsetmaxtotalsize-method) + - SetMaxTotalSize sets the maximum total size for all buffers in the pool. - **`FileStream.Size`** - [FileStream.Size](api_streaming.md#1332-filestreamsize-method) - Size returns the total size of the stream in bytes. -- **`StreamingConfigBuilder.Build`** - [StreamingConfigBuilder.Build](api_streaming.md#4218-streamingconfigbuilderbuild-method) - - Build constructs and returns the final streaming configuration. +- **`StreamingWorkerPool.Start`** - [StreamingWorkerPool.Start](api_streaming.md#3311-streamingworkerpoolstart-method) + - Start initializes and starts the streaming worker pool Returns *PackageError on failure. +- **`StreamingWorkerPool.Stop`** - [StreamingWorkerPool.Stop](api_streaming.md#3312-streamingworkerpoolstop-method) + - Stop gracefully shuts down the streaming worker pool Returns *PackageError on failure. +- **`StreamingWorkerPool.SubmitStreamingJob`** - [StreamingWorkerPool.SubmitStreamingJob](api_streaming.md#3321-streamingworkerpoolsubmitstreamingjob-method) + - SubmitStreamingJob submits a streaming job to the worker pool Returns *PackageError on failure. +- **`BufferPool.TotalSize`** - [BufferPool.TotalSize](api_streaming.md#2321-bufferpoolttotalsize-method) + - BufferPool.TotalSize Additional BufferPool methods. - **`StreamingConfigBuilder.WithChunkProcessingMode`** - [StreamingConfigBuilder.WithChunkProcessingMode](api_streaming.md#4215-streamingconfigbuilderwithchunkprocessingmode-method) - WithChunkProcessingMode sets the chunk processing mode for the configuration. - **`StreamingConfigBuilder.WithMaxStreamsPerWorker`** - [StreamingConfigBuilder.WithMaxStreamsPerWorker](api_streaming.md#4216-streamingconfigbuilderwithmaxstreamsperworker-method) @@ -1332,16 +1376,8 @@ This subsection groups `FileEntry` methods by operational category. - WithStreamBufferSize sets the stream buffer size for the configuration. - **`StreamingConfigBuilder.WithStreamTimeout`** - [StreamingConfigBuilder.WithStreamTimeout](api_streaming.md#4217-streamingconfigbuilderwithstreamtimeout-method) - WithStreamTimeout sets the stream timeout for the configuration. -- **`StreamingWorkerPool.GetStreamingStats`** - [StreamingWorkerPool.GetStreamingStats](api_streaming.md#333-streamingworkerpoolgetstreamingstats-method) - - GetStreamingStats returns current streaming worker pool statistics. -- **`StreamingWorkerPool.Start`** - [StreamingWorkerPool.Start](api_streaming.md#3311-streamingworkerpoolstart-method) - - Start initializes and starts the streaming worker pool Returns *PackageError on failure. -- **`StreamingWorkerPool.Stop`** - [StreamingWorkerPool.Stop](api_streaming.md#3312-streamingworkerpoolstop-method) - - Stop gracefully shuts down the streaming worker pool Returns *PackageError on failure. -- **`StreamingWorkerPool.SubmitStreamingJob`** - [StreamingWorkerPool.SubmitStreamingJob](api_streaming.md#3321-streamingworkerpoolsubmitstreamingjob-method) - - SubmitStreamingJob submits a streaming job to the worker pool Returns *PackageError on failure. -### 9.2 Streaming and Buffer Helper Functions +### 7.2 Streaming and Buffer Helper Functions - **`CreateStreamingConfig`** - [4.3.2 CreateStreamingConfig Function](api_streaming.md#432-createstreamingconfig-function) - CreateStreamingConfig creates a streaming configuration with intelligent defaults. @@ -1360,29 +1396,13 @@ This subsection groups `FileEntry` methods by operational category. - **`ValidateStreamingConfig`** - [Validatestreamingconfig](api_streaming.md#433-validatestreamingconfig-function) - ValidateStreamingConfig validates streaming configuration settings Returns *PackageError on failure. -## 10. Deduplication Types - -This section groups deduplication-related types, methods, and helpers. For details, see [Deduplication API](api_deduplication.md). - -### 10.1 Deduplication Methods - -This subsection groups deduplication methods (for example, duplicate detection and conversion workflows). - -### 10.2 Deduplication Helper Functions - -This subsection groups helper functions used by deduplication operations. - -## 11. FileType System Types +## 8. FileType System Types - **`FileType`** - [3.1 FileType Type](file_type_system.md#31-filetype-type) - FileType represents a file type identifier Note: This is the authoritative definition. - All other references should link to this document. -### 11.1 FileType System Methods - -This subsection groups methods related to file type classification and file type => compression selection logic. - -### 11.2 FileType System Helper Functions +### 8.1 FileType System Helper Functions - **`DetermineFileType`** - [Determinefiletype](file_type_system.md#411-determinefiletype-function-detection-process) - DetermineFileType uses a sophisticated multi-stage detection process to identify file types. @@ -1410,12 +1430,9 @@ This subsection groups methods related to file type classification and file type - Special file handling: Uses IsSpecialFile() to check for special file types. - FileTypeSignature: Never compress signature files (returns CompressionNone). - Special file handling: FileTypeMetadata, FileTypeManifest, FileTypeIndex: Always compress YAML special files (returns CompressionZstd). - - Other special files: Default compression (returns CompressionZstd). - Text-based files: Returns CompressionZstd for text, script, and config files (good compression for text). - Binary media files: Returns CompressionLZ4 for image, audio, and video files (fast compression for binary data). - Default: Returns CompressionZstd as default compression method. + - Other special files: Default compression (returns CompressionZstd). Text-based files: Returns CompressionZstd for text, script, and config files (good compression for text). Binary media files: Returns CompressionLZ4 for image, audio, and video files (fast compression for binary data). Default: Returns CompressionZstd as default compression method. -## 12. Generic Types +## 9. Generic Types - **`ConcurrencyConfig`** - [1.8.4 ConcurrencyConfig Structure](api_generics.md#184-concurrencyconfig-structure) - ConcurrencyConfig defines worker pool and thread safety settings. @@ -1444,58 +1461,60 @@ This subsection groups methods related to file type classification and file type - **`WorkerPool`** - [1.8.1 WorkerPool Structure](api_generics.md#181-workerpool-structure) - WorkerPool manages concurrent workers for any data type. -### 12.1 Generic Methods +### 9.1 Generic Methods -- **`ConfigBuilder.Build`** - [ConfigBuilder.Build](api_generics.md#11027-configbuildertbuild-method) +- **`ConfigBuilder.Build`** - [ConfigBuilder.Build](api_generics.md#11028-configbuildertbuild-method) - Build constructs and returns the final configuration. -- **`ConfigBuilder.WithChunkSize`** - [ConfigBuilder.WithChunkSize](api_generics.md#11023-configbuildertwithchunksize-method) - - WithChunkSize sets the chunk size for the configuration. -- **`ConfigBuilder.WithCompressionLevel`** - [ConfigBuilder.WithCompressionLevel](api_generics.md#11025-configbuildertwithcompressionlevel-method) - - WithCompressionLevel sets the compression level for the configuration. -- **`ConfigBuilder.WithMemoryUsage`** - [ConfigBuilder.WithMemoryUsage](api_generics.md#11024-configbuildertwithmemoryusage-method) - - WithMemoryUsage sets the memory usage limit for the configuration. -- **`ConfigBuilder.WithStrategy`** - [ConfigBuilder.WithStrategy](api_generics.md#11026-configbuildertwithstrategy-method) - - WithStrategy sets the processing strategy for the configuration. - **`Option.Clear`** - [Option.Clear](api_generics.md#116-optiontclear-method) - Clear clears the option value. - **`Option.Get`** - [Option.Get](api_generics.md#113-optiontget-method) - Get returns the value and a boolean indicating if the value is set. - **`Option.GetOrDefault`** - [Option.GetOrDefault](api_generics.md#114-optiontgetordefault-method) - GetOrDefault returns the value if set, otherwise returns the default value. -- **`Option.IsSet`** - [Option.IsSet](api_generics.md#115-optiontisset-method) - - IsSet returns true if the option has a value set. -- **`Option.Set`** - [Option.Set](api_generics.md#112-optiontset-method) - - Set sets the option value. - **`PathEntry.GetPath`** - [PathEntry.GetPath](api_generics.md#1316-pathentrygetpath-method) - GetPath returns the path string as stored (Unix-style with forward slashes). - **`PathEntry.GetPathForPlatform`** - [PathEntry.GetPathForPlatform](api_generics.md#1317-pathentrygetpathforplatform-method) - GetPathForPlatform returns the path string converted for the specified platform On Windows, converts forward slashes to backslashes On Unix/Linux, returns the path as stored (with forward slashes). -- **`PathEntry.ReadFrom`** - [PathEntry.ReadFrom](api_generics.md#1314-pathentryreadfrom-method) - - ReadFrom reads a PathEntry from the provided io.Reader Implements io.ReaderFrom interface Returns number of bytes read and any error encountered. -- **`PathEntry.Size`** - [PathEntry.Size](api_generics.md#1313-pathentrysize-method) - - Size returns the total size of the PathEntry in bytes Formula: 2 (PathLength) + PathLength (Path). -- **`PathEntry.Validate`** - [PathEntry.Validate](api_generics.md#1312-pathentryvalidate-method) - - Validate performs validation checks on the PathEntry Returns error if PathLength doesn't match Path length, or if Path is empty/invalid. -- **`PathEntry.WriteTo`** - [PathEntry.WriteTo](api_generics.md#1315-pathentrywriteto-method) - - WriteTo writes a PathEntry to the provided io.Writer Implements io.WriterTo interface Returns number of bytes written and any error encountered. +- **`WorkerPool.GetWorkerStats`** - [WorkerPool.GetWorkerStats](api_generics.md#194-workerpooltgetworkerstats-method) + - GetWorkerStats returns current worker pool statistics. - **`Result.IsErr`** - [Result.IsErr](api_generics.md#126-resulttiserr-method) - IsErr returns true if the Result contains an error. - **`Result.IsOk`** - [Result.IsOk](api_generics.md#125-resulttisok-method) - IsOk returns true if the Result contains a value (no error). -- **`Result.Unwrap`** - [Result.Unwrap](api_generics.md#124-resulttunwrap-method) - - Unwrap returns the value and error from the Result. -- **`ValidationRule.Validate`** - [ValidationRule.Validate](api_generics.md#1722-validationruletvalidate-method) - - ValidationRule.Validate Returns *PackageError on failure. -- **`WorkerPool.GetWorkerStats`** - [WorkerPool.GetWorkerStats](api_generics.md#194-workerpooltgetworkerstats-method) - - GetWorkerStats returns current worker pool statistics. +- **`Option.IsSet`** - [Option.IsSet](api_generics.md#115-optiontisset-method) + - IsSet returns true if the option has a value set. +- **`PathEntry.ReadFrom`** - [PathEntry.ReadFrom](api_generics.md#1314-pathentryreadfrom-method) + - ReadFrom reads a PathEntry from the provided io.Reader Implements io.ReaderFrom interface Returns number of bytes read and any error encountered. +- **`Option.Set`** - [Option.Set](api_generics.md#112-optiontset-method) + - Set sets the option value. +- **`PathEntry.Size`** - [PathEntry.Size](api_generics.md#1313-pathentrysize-method) + - Size returns the total size of the PathEntry in bytes Formula: 2 (PathLength) + PathLength (Path). - **`WorkerPool.Start`** - [WorkerPool.Start](api_generics.md#191-workerpooltstart-method) - Start initializes and starts the worker pool Returns *PackageError on failure. - **`WorkerPool.Stop`** - [WorkerPool.Stop](api_generics.md#192-workerpooltstop-method) - Stop gracefully shuts down the worker pool Returns *PackageError on failure. - **`WorkerPool.SubmitJob`** - [WorkerPool.SubmitJob](api_generics.md#193-workerpooltsubmitjob-method) - SubmitJob submits a job to the worker pool Returns *PackageError on failure. +- **`Result.Unwrap`** - [Result.Unwrap](api_generics.md#124-resulttunwrap-method) + - Unwrap returns the value and error from the Result. +- **`PathEntry.Validate`** - [PathEntry.Validate](api_generics.md#1312-pathentryvalidate-method) + - Validate performs validation checks on the PathEntry Returns error if PathLength doesn't match Path length, or if Path is empty/invalid. +- **`ValidationRule.Validate`** - [ValidationRule.Validate](api_generics.md#1722-validationruletvalidate-method) + - ValidationRule.Validate Returns *PackageError on failure. +- **`ConfigBuilder.WithChunkSize`** - [ConfigBuilder.WithChunkSize](api_generics.md#11023-configbuildertwithchunksize-method) + - WithChunkSize sets the chunk size for the configuration. +- **`ConfigBuilder.WithCompressionLevel`** - [ConfigBuilder.WithCompressionLevel](api_generics.md#11025-configbuildertwithcompressionlevel-method) + - WithCompressionLevel sets the compression level for the configuration. +- **`ConfigBuilder.WithMemoryUsage`** - [ConfigBuilder.WithMemoryUsage](api_generics.md#11024-configbuildertwithmemoryusage-method) + - WithMemoryUsage sets the memory usage limit for the configuration. +- **`ConfigBuilder.WithStrategy`** - [ConfigBuilder.WithStrategy](api_generics.md#11026-configbuildertwithstrategy-method) + - WithStrategy sets the processing strategy for the configuration. +- **`ConfigBuilder.WithValidator`** - [ConfigBuilder.WithValidator](api_generics.md#11027-configbuildertwithvalidator-method) + - WithValidator sets the builder validator for configuration validation. +- **`PathEntry.WriteTo`** - [PathEntry.WriteTo](api_generics.md#1315-pathentrywriteto-method) + - WriteTo writes a PathEntry to the provided io.Writer Implements io.WriterTo interface Returns number of bytes written and any error encountered. -### 12.2 Generic Helper Functions +### 9.2 Generic Helper Functions - **`ComposeValidators`** - [ComposeValidators](api_generics.md#223-composevalidators-function) - ComposeValidators creates a validator that runs multiple validators. @@ -1512,7 +1531,7 @@ This subsection groups methods related to file type classification and file type - **`ValidateWith`** - [ValidateWith](api_generics.md#221-validatewith-function) - ValidateWith validates a single value using a validator. -## 13. Error Types +## 10. Error Types - **`ErrorType`** - [10.2 ErrorType Types and Categories](api_core.md#102-errortype-types-and-categories) - ErrorType categorizes errors for programmatic handling. @@ -1527,16 +1546,20 @@ This subsection groups methods related to file type classification and file type - **`ReadOnlyErrorContext`** - [11.5 ReadOnlyErrorContext Structure](api_basic_operations.md#115-readonlyerrorcontext-structure) - ReadOnlyErrorContext provides typed context for read-only enforcement errors. -### 13.1 Error Methods +### 10.1 Error Methods -- **`PackageError.Is`** - [10.4.3 PackageError.Is Method](api_core.md#1043-packageerroris-method) - - Is implements error matching for error comparison. - **`PackageError.Error`** - [10.4.1 PackageError.Error Method](api_core.md#1041-packageerrorerror-method) - Error returns the formatted error string. +- **`PackageError.Is`** - [10.4.3 PackageError.Is Method](api_core.md#1043-packageerroris-method) + - Is implements error matching for error comparison. +- **`ErrorType.String`** - [10.2.1 ErrorType.String Method](api_core.md#1021-errortypestring-method) + - String returns a human-readable name for the error type. - **`PackageError.Unwrap`** - [10.4.2 PackageError.Unwrap Method](api_core.md#1042-packageerrorunwrap-method) - Unwrap returns the underlying cause error. +- **`PackageError.WithContext`** - [10.4.4 PackageError.WithContext Method](api_core.md#1044-packageerrorwithcontext-method) + - WithContext adds a key/value context entry and returns the updated error. -### 13.2 Error Helper Functions +### 10.2 Error Helper Functions - **`AddErrorContext`** - [Adderrorcontext](api_core.md#1055-adderrorcontext-function) - AddErrorContext adds type-safe context to errors. @@ -1546,56 +1569,7 @@ This subsection groups methods related to file type classification and file type - GetErrorContext retrieves type-safe context from errors. - **`MapError`** - [Maperror](api_core.md#1056-maperror-function) - MapError transforms an error with a generic mapper function. +- **`NewPackageError`** - [10.5.1 NewPackageError Function](api_core.md#1051-newpackageerror-function) + - NewPackageError creates a structured error with type-safe context All errors must include typed context for type safety. - **`WrapErrorWithContext`** - [Wraperrorwithcontext](api_core.md#1052-wraperrorwithcontext-function) - WrapErrorWithContext wraps an error with type-safe context. - -## 14. Other Types - -- **`AddFileOptions`** - [2.8 AddFileOptions Struct](api_file_mgmt_addition.md#28-addfileoptions-struct) - - AddFileOptions configures file addition behavior (path determination, metadata preservation, and processing options). -- **`CreateOptions`** - [7.6 CreateOptions Structure](api_basic_operations.md#76-createoptions-structure) - - CreateOptions configures package creation behavior (including initial metadata and settings). -- **`DestPathSpec`** - [DestPathSpec](api_file_mgmt_extraction.md#1511-destpathspec-struct) - - DestPathSpec configures a destination path override for extraction. -- **`ExtractPathOptions`** - [2. ExtractPathOptions Struct](api_file_mgmt_extraction.md#2-extractpathoptions-struct) - - ExtractPathOptions configures filesystem extraction behavior. -- **`FileIndex`** - [6.1.2 FileIndex Struct](package_file_format.md#612-fileindex-struct) - - FileIndex represents the file index section of a package. -- **`FileInfo`** - [1.2.4 FileInfo Structure](api_core.md#124-fileinfo-structure) - - FileInfo provides lightweight file information for listing operations. -- **`IndexEntry`** - [6.1.1 IndexEntry Struct](package_file_format.md#611-indexentry-struct) - - IndexEntry represents a single file index entry. -- **`PackageConfig`** - [9.1 PackageConfig Structure](api_basic_operations.md#91-packageconfig-structure) - - PackageConfig provides package-level configuration for path handling behavior. -- **`PathHandling`** - [9.3 PathHandling Type](api_basic_operations.md#93-pathhandling-type) - - PathHandling specifies how to handle multiple paths pointing to the same content. -- **`RecoveryFileHeader`** - [RecoveryFileHeader](api_writing.md#2721-recoveryfileheader-structure) - - RecoveryFileHeader contains header information for recovery files used by writing operations. -- **`RemoveDirectoryOptions`** - [4.4 RemoveDirectoryOptions Struct](api_file_mgmt_removal.md#44-removedirectoryoptions-struct) - - RemoveDirectoryOptions configures directory removal behavior. -- **`SymlinkConvertOptions`** - [1.7 SymlinkConvertOptions Struct](api_file_mgmt_updates.md#17-symlinkconvertoptions-struct) - - SymlinkConvertOptions configures path-to-symlink conversion behavior. -- **`TransformPipeline`** - [2.2 TransformPipeline Structure](api_file_mgmt_transform_pipelines.md#22-transformpipeline-structure) - - TransformPipeline tracks a multi-stage transformation pipeline for large or multi-step operations. -- **`TransformType`** - [2.4 TransformType Type](api_file_mgmt_transform_pipelines.md#24-transformtype-type) - - TransformType identifies the type of a transformation stage (compress, encrypt, etc.). -- **`readOnlyPackage`** - [11.3 readOnlyPackage Struct](api_basic_operations.md#113-readonlypackage-struct) - - readOnlyPackage is a wrapper that enforces read-only behavior for a Package. - -### 14.1 Other Type Methods - -- **`readOnlyPackage.readOnlyError`** - [readOnlyPackage.readOnlyError](api_basic_operations.md#114-readonlypackagereadonlyerror-method) - - readOnlyError creates a structured security error for read-only enforcement. - -## 15. General Helper Functions - -This section groups general-purpose helper functions referenced across the specs. - -### 15.1 General Validation Functions - -This subsection groups validation helpers (for example, format and input validation utilities). - -### 15.2 General Utility Functions - -- **`NewFileIndex`** - [NewFileIndex](package_file_format.md#613-newfileindex-function) - - NewFileIndex creates and returns a new FileIndex with zero values. diff --git a/docs/tech_specs/api_metadata.md b/docs/tech_specs/api_metadata.md index 5711d491..af5f1d31 100644 --- a/docs/tech_specs/api_metadata.md +++ b/docs/tech_specs/api_metadata.md @@ -117,10 +117,11 @@ - [8.1.6 ACLEntry Structure](#816-aclentry-structure) - [8.1.7 PathMetadataEntry Tag Management](#817-pathmetadataentry-tag-management) - [8.1.8 PathMetadataEntry Methods](#818-pathmetadataentry-methods) - - [8.1.9 `PathInfo` Structure](#819-pathinfo-structure) - - [8.1.10 `FilePathAssociation` Structure](#8110-filepathassociation-structure) - - [8.1.11 `DestPathOverride` Structure](#8111-destpathoverride-structure) - - [8.1.12 `DestPathInput` Interface](#8112-destpathinput-interface) + - [8.1.9 PathMetadataEntry Validation Methods](#819-pathmetadataentry-validation-methods) + - [8.1.10 `PathInfo` Structure](#8110-pathinfo-structure) + - [8.1.11 `FilePathAssociation` Structure](#8111-filepathassociation-structure) + - [8.1.12 `DestPathOverride` Structure](#8112-destpathoverride-structure) + - [8.1.13 `DestPathInput` Interface](#8113-destpathinput-interface) - [8.2 `PathMetadata` Management Methods](#82-pathmetadata-management-methods) - [8.2.1 Core `PathMetadata` CRUD Operations](#821-core-pathmetadata-crud-operations) - [Path Information Query Methods](#path-information-query-methods) @@ -1654,7 +1655,18 @@ func (pme *PathMetadataEntry) AssociateWithFileEntry(fe *FileEntry) error func (pme *PathMetadataEntry) GetAssociatedFileEntries() []*FileEntry ``` -#### 8.1.9 PathInfo Structure +#### 8.1.9 PathMetadataEntry Validation Methods + +This section defines validation behavior for PathMetadataEntry instances. + +##### 8.1.9.1 PathMetadataEntry.Validate Method + +```go +// Validate validates the PathMetadataEntry state and returns an error on failure. +func (pme *PathMetadataEntry) Validate() error +``` + +#### 8.1.10 PathInfo Structure ```go // PathInfo provides runtime path metadata information @@ -1667,7 +1679,7 @@ type PathInfo struct { } ``` -#### 8.1.10 FilePathAssociation Structure +#### 8.1.11 FilePathAssociation Structure ```go // FilePathAssociation links files to their path metadata @@ -1680,7 +1692,7 @@ type FilePathAssociation struct { } ``` -#### 8.1.11 DestPathOverride Structure +#### 8.1.12 DestPathOverride Structure ```go // DestPathOverride specifies destination extraction directory overrides. @@ -1692,7 +1704,7 @@ type DestPathOverride struct { } ``` -#### 8.1.12 DestPathInput Interface +#### 8.1.13 DestPathInput Interface ```go // DestPathInput is the allowed input type set for SetDestPath. diff --git a/docs/tech_specs/file_validation.md b/docs/tech_specs/file_validation.md index 5631e7ea..f9e5f5d4 100644 --- a/docs/tech_specs/file_validation.md +++ b/docs/tech_specs/file_validation.md @@ -49,7 +49,7 @@ This section describes file validation requirements for packages. - **Path normalization:** Paths are normalized according to [Path Rules](api_core.md#22-path-rules) (separators normalized to `/`, dot segments converted to canonical paths) - **Standardized path format:** All paths are stored in a consistent, normalized format as specified in [Package Path Semantics](api_core.md#2-package-path-semantics) - **Cross-platform compatibility:** Paths are handled consistently regardless of input platform per [Package Path Semantics](api_core.md#2-package-path-semantics) -- **Path length:** Path length limits and portability warnings are specified in [api_core.md Path Length Limits](api_core.md#215-path-length-limits) and [ValidatePathLength Function](api_core.md#124-validatepathlength-function). **Go API**: `novuspack.ValidatePathLength(path string) ([]string, error)`. See [api_go_defs_index 5.4](api_go_defs_index.md#151-general-validation-functions). +- **Path length:** Path length limits and portability warnings are specified in [api_core.md Path Length Limits](api_core.md#215-path-length-limits) and [ValidatePathLength Function](api_core.md#124-validatepathlength-function). **Go API**: `novuspack.ValidatePathLength(path string) ([]string, error)`. See [Go API Definitions Index - Package Helper Functions](api_go_defs_index.md#119-package-helper-functions). ### 1.4 Transparency Requirements From 55982aef06de20d9d5b0025f2214b6de76c56b61 Mon Sep 17 00:00:00 2001 From: Andre Date: Tue, 3 Feb 2026 03:39:09 -0500 Subject: [PATCH 5/7] fix(api/go): stabilize package I/O and metadata handling Align Go implementation and tests with the updated package read/write API surface, including path metadata handling, file index/header behavior, signatures, and supporting helpers. --- api/go/fileformat/file_format_constants.go | 2 +- api/go/fileformat/fileindex.go | 176 +-- api/go/fileformat/fileindex_test.go | 221 ++-- api/go/fileformat/header.go | 28 +- api/go/fileformat/header_test.go | 541 ++++----- api/go/fileformat/testutil/helpers.go | 20 +- api/go/fileformat/testutil/helpers_test.go | 25 +- api/go/generics/concurrency.go | 52 +- api/go/generics/config_test.go | 53 +- api/go/generics/core_test.go | 142 +-- api/go/generics/pathentry_test.go | 8 + api/go/generics/patterns_test.go | 2 +- api/go/generics/tag.go | 2 +- api/go/generics/tag_test.go | 66 +- api/go/generics/validation.go | 81 +- api/go/generics/validation_test.go | 8 +- api/go/internal/helpers.go | 76 +- api/go/internal/helpers_test.go | 152 +-- api/go/internal/testhelpers/context_test.go | 40 +- api/go/internal/testhelpers/io_test.go | 128 +-- api/go/internal/testhelpers/strings_test.go | 1 + api/go/metadata/comment.go | 74 +- api/go/metadata/comment_constants.go | 4 +- api/go/metadata/comment_test.go | 69 +- api/go/metadata/entry_io_helpers_test.go | 88 ++ api/go/metadata/fileentry.go | 194 ++-- api/go/metadata/fileentry_data.go | 62 +- api/go/metadata/fileentry_data_test.go | 182 +-- api/go/metadata/fileentry_directory.go | 6 +- api/go/metadata/fileentry_directory_test.go | 158 +-- api/go/metadata/fileentry_marshal.go | 89 +- api/go/metadata/fileentry_marshal_test.go | 54 +- api/go/metadata/fileentry_path_test.go | 78 +- api/go/metadata/fileentry_tags.go | 156 +-- api/go/metadata/fileentry_tags_test.go | 1015 ++++++----------- api/go/metadata/fileentry_test.go | 219 ++-- api/go/metadata/hashentry.go | 81 +- api/go/metadata/hashentry_test.go | 178 +-- api/go/metadata/hashentry_validate_test.go | 39 +- api/go/metadata/length_prefixed_io.go | 68 ++ api/go/metadata/optionaldata.go | 81 +- api/go/metadata/optionaldata_test.go | 179 +-- api/go/metadata/optionaldata_validate_test.go | 43 +- api/go/metadata/package_info_test.go | 6 + api/go/metadata/package_metadata.go | 14 +- api/go/metadata/package_metadata_test.go | 63 +- .../metadata/path_metadata_entry_methods.go | 4 + api/go/metadata/path_metadata_entry_tags.go | 11 +- api/go/metadata/path_metadata_entry_test.go | 33 +- .../path_metadata_inheritance_fixture_test.go | 22 + api/go/metadata/slice_validation.go | 31 + api/go/metadata/tags_filter.go | 19 + api/go/metadata/validate_table_test.go | 27 + api/go/novus_package/package.go | 51 +- api/go/novus_package/package_builder.go | 7 - api/go/novus_package/package_builder_test.go | 38 +- api/go/novus_package/package_comment.go | 87 +- api/go/novus_package/package_comment_test.go | 64 +- api/go/novus_package/package_file_lookup.go | 4 +- .../novus_package/package_file_management.go | 170 ++- .../package_file_management_test.go | 102 +- api/go/novus_package/package_identity.go | 30 +- api/go/novus_package/package_identity_test.go | 585 ++++------ api/go/novus_package/package_lifecycle.go | 221 +++- .../novus_package/package_lifecycle_test.go | 419 +++---- ..._path_canonicalization_integration_test.go | 9 +- .../package_path_metadata_associations.go | 6 +- ...package_path_metadata_associations_test.go | 41 +- .../package_path_metadata_directories.go | 2 + .../package_path_metadata_directories_test.go | 156 +-- .../package_path_metadata_files.go | 2 +- .../package_path_metadata_files_test.go | 6 +- .../package_path_metadata_helpers.go | 14 +- .../package_path_metadata_hierarchy_test.go | 91 +- .../package_path_metadata_test.go | 63 +- api/go/novus_package/package_reader.go | 117 +- .../package_reader_additional_test.go | 66 +- .../package_reader_coverage_test.go | 75 +- api/go/novus_package/package_reader_test.go | 232 +--- api/go/novus_package/package_session.go | 10 +- api/go/novus_package/package_session_test.go | 10 +- .../novus_package/package_target_path_test.go | 6 +- api/go/novus_package/package_test.go | 345 +++++- api/go/novus_package/package_types.go | 49 +- .../package_version_tracking_test.go | 9 +- .../package_write_integration_test.go | 89 +- api/go/novus_package/package_writer.go | 214 +++- .../package_writer_comprehensive_test.go | 185 +-- .../package_writer_edge_cases_test.go | 67 +- api/go/novus_package/package_writer_test.go | 183 +-- api/go/novuspack.go | 35 +- api/go/novuspack_test.go | 44 +- api/go/pkgerrors/pkgerrors_test.go | 202 ++-- api/go/signatures/signature.go | 358 +++--- api/go/signatures/signature_test.go | 221 ++-- 95 files changed, 4170 insertions(+), 5686 deletions(-) create mode 100644 api/go/metadata/entry_io_helpers_test.go create mode 100644 api/go/metadata/length_prefixed_io.go create mode 100644 api/go/metadata/path_metadata_inheritance_fixture_test.go create mode 100644 api/go/metadata/slice_validation.go create mode 100644 api/go/metadata/tags_filter.go create mode 100644 api/go/metadata/validate_table_test.go diff --git a/api/go/fileformat/file_format_constants.go b/api/go/fileformat/file_format_constants.go index b9efffba..28ce3b04 100644 --- a/api/go/fileformat/file_format_constants.go +++ b/api/go/fileformat/file_format_constants.go @@ -4,7 +4,7 @@ // only constant definitions used throughout the fileformat package and re-exported // by the main novuspack package. // -// Specification: package_file_format.md: 1 `.nvpk` File Format Overview +// Specification: package_file_format.md: 1 1. .Nvpk File Format Overview // Package novuspack provides the core NovusPack file format implementation. // diff --git a/api/go/fileformat/fileindex.go b/api/go/fileformat/fileindex.go index fa13e492..db55b803 100644 --- a/api/go/fileformat/fileindex.go +++ b/api/go/fileformat/fileindex.go @@ -64,7 +64,7 @@ type FileIndex struct { // - All FileIDs must be unique and non-zero // // Returns an error if any validation check fails. -func (f *FileIndex) Validate() error { +func (f *FileIndex) validate() error { if f.Reserved != 0 { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "reserved field must be zero", nil, pkgerrors.ValidationErrorContext{ Field: "Reserved", @@ -115,7 +115,7 @@ func (f *FileIndex) Validate() error { // Size returns the total size of the FileIndex in bytes. // // Specification: package_file_format.md: 6 File Index Section -func (f *FileIndex) Size() int { +func (f *FileIndex) size() int { return 16 + (IndexEntrySize * len(f.Entries)) } @@ -138,152 +138,94 @@ func NewFileIndex() *FileIndex { // // Returns the number of bytes read and any error encountered. // -// Specification: package_file_format.md: 6 File Index Section -func (f *FileIndex) ReadFrom(r io.Reader) (int64, error) { - var totalRead int64 - - // Read header (16 bytes) - var entryCount uint32 - if err := binary.Read(r, binary.LittleEndian, &entryCount); err != nil { +// readFileIndexHeader reads the 16-byte header and returns entryCount, reserved, firstEntryOffset, totalRead, error. +func readFileIndexHeader(r io.Reader) (entryCount, reserved uint32, firstEntryOffset uint64, totalRead int64, err error) { + if err = binary.Read(r, binary.LittleEndian, &entryCount); err != nil { if err == io.EOF || err == io.ErrUnexpectedEOF { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to read entry count: incomplete data", pkgerrors.ValidationErrorContext{ - Field: "EntryCount", - Value: totalRead, - Expected: "4 bytes", + return 0, 0, 0, 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to read entry count: incomplete data", pkgerrors.ValidationErrorContext{ + Field: "EntryCount", Value: int64(0), Expected: "4 bytes", }) } - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read entry count", pkgerrors.ValidationErrorContext{ - Field: "EntryCount", - Value: nil, - Expected: "4 bytes", + return 0, 0, 0, 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read entry count", pkgerrors.ValidationErrorContext{ + Field: "EntryCount", Value: nil, Expected: "4 bytes", }) } - totalRead += 4 - f.EntryCount = entryCount - - var reserved uint32 - if err := binary.Read(r, binary.LittleEndian, &reserved); err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read reserved", pkgerrors.ValidationErrorContext{ - Field: "Reserved", - Value: nil, - Expected: "4 bytes", + totalRead = 4 + if err = binary.Read(r, binary.LittleEndian, &reserved); err != nil { + return 0, 0, 0, totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read reserved", pkgerrors.ValidationErrorContext{ + Field: "Reserved", Value: nil, Expected: "4 bytes", }) } totalRead += 4 - f.Reserved = reserved - - var firstEntryOffset uint64 - if err := binary.Read(r, binary.LittleEndian, &firstEntryOffset); err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read first entry offset", pkgerrors.ValidationErrorContext{ - Field: "FirstEntryOffset", - Value: nil, - Expected: "8 bytes", + if err = binary.Read(r, binary.LittleEndian, &firstEntryOffset); err != nil { + return 0, 0, 0, totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read first entry offset", pkgerrors.ValidationErrorContext{ + Field: "FirstEntryOffset", Value: nil, Expected: "8 bytes", }) } totalRead += 8 - f.FirstEntryOffset = firstEntryOffset - - // Read entries - // Validate entry count before allocation to prevent OOM from corrupted/malicious files - // Specification: package_file_format.md: 6.1 File Index Structure - // EntryCount is a uint32 (4 bytes) as per specification, maximum value: 4,294,967,295 - // Note: Since entryCount is already uint32, it cannot exceed this limit by type definition, - // but we validate against practical limits below to prevent OOM attacks. + return entryCount, reserved, firstEntryOffset, totalRead, nil +} - // Check if allocation size would overflow or exceed maximum slice size - // maxInt represents the maximum slice length in Go (architecture-dependent) - // On 64-bit systems: 2^63 - 1 = 9,223,372,036,854,775,807 - // On 32-bit systems: 2^31 - 1 = 2,147,483,647 - // Note: entryCount is uint32, so on 64-bit systems this check will never trigger - // (max uint32 = 4,294,967,295 < maxInt on 64-bit), but it's kept for correctness - // and potential future use with larger integer types. - const maxInt = int(^uint(0) >> 1) // Maximum value for int (architecture-dependent) +// validateEntryCountAllocation checks entryCount against allocation limits; returns an error if allocation would be unsafe. +func validateEntryCountAllocation(entryCount uint32, totalRead int64) error { + const maxInt = int(^uint(0) >> 1) if int(entryCount) > maxInt { - return totalRead, pkgerrors.WrapErrorWithContext( + return pkgerrors.WrapErrorWithContext( fmt.Errorf("entry count %d exceeds maximum slice size %d", entryCount, maxInt), - pkgerrors.ErrTypeValidation, - "entry count exceeds system allocation limits", - pkgerrors.ValidationErrorContext{ - Field: "EntryCount", - Value: entryCount, - Expected: fmt.Sprintf("value <= %d", maxInt), - }, + pkgerrors.ErrTypeValidation, "entry count exceeds system allocation limits", + pkgerrors.ValidationErrorContext{Field: "EntryCount", Value: entryCount, Expected: fmt.Sprintf("value <= %d", maxInt)}, ) } - - // Calculate total required bytes for allocation - // Each IndexEntry is 16 bytes - // Note: uint32 max * 16 = 68,719,476,720 bytes (~64 GB), which is well within uint64 range - // so overflow in this multiplication is impossible requiredBytes := uint64(entryCount) * uint64(IndexEntrySize) - - // Check if required bytes would exceed maximum slice size in bytes - // This checks if entryCount * IndexEntrySize would exceed maxInt bytes - // On 64-bit: maxInt/16 = 576,460,752,303,423,487, so this check never triggers - // (max uint32 = 4,294,967,295 is way smaller) - // On 32-bit: maxInt/16 = 134,217,727, so this check catches values in range - // 134,217,728 to 2,147,483,647 that would cause allocation size to exceed maxInt - // (The earlier check catches values > 2,147,483,647) - if entryCount > 0 { - // Check if entryCount * IndexEntrySize would exceed maxInt when converted to int - // This prevents allocations that would exceed Go's slice size limits - if int(entryCount) > maxInt/int(IndexEntrySize) { - return totalRead, pkgerrors.WrapErrorWithContext( - fmt.Errorf("entry count %d would require allocation exceeding maximum slice size", entryCount), - pkgerrors.ErrTypeValidation, - "entry count exceeds maximum allocation size", - pkgerrors.ValidationErrorContext{ - Field: "EntryCount", - Value: entryCount, - Expected: fmt.Sprintf("value <= %d", maxInt/int(IndexEntrySize)), - }, - ) - } + if entryCount > 0 && int(entryCount) > maxInt/int(IndexEntrySize) { + return pkgerrors.WrapErrorWithContext( + fmt.Errorf("entry count %d would require allocation exceeding maximum slice size", entryCount), + pkgerrors.ErrTypeValidation, "entry count exceeds maximum allocation size", + pkgerrors.ValidationErrorContext{Field: "EntryCount", Value: entryCount, Expected: fmt.Sprintf("value <= %d", maxInt/int(IndexEntrySize))}, + ) + } + if requiredBytes <= 1024*1024*1024 { + return nil } - - // Check available system memory dynamically to prevent OOM - // This respects actual system constraints rather than hard-coded limits var memStats runtime.MemStats runtime.ReadMemStats(&memStats) - // For very large allocations (>1GB), check against available system memory - // Use a conservative estimate based on system memory statistics - if requiredBytes > 1024*1024*1024 { // > 1GB - // For large allocations, require that it's less than 50% of system memory - // or less than 10GB if system memory is unknown/very large - maxReasonableAllocation := uint64(10 * 1024 * 1024 * 1024) // 10GB default - if memStats.Sys > 0 && memStats.Sys < maxReasonableAllocation*2 { - maxReasonableAllocation = memStats.Sys / 2 - } - if requiredBytes > maxReasonableAllocation { - return totalRead, pkgerrors.WrapErrorWithContext( - fmt.Errorf("entry count %d would require %d bytes (%d GB), exceeding available system memory", entryCount, requiredBytes, requiredBytes/(1024*1024*1024)), - pkgerrors.ErrTypeValidation, - "entry count exceeds available system memory", - pkgerrors.ValidationErrorContext{ - Field: "EntryCount", - Value: entryCount, - Expected: "value within available system memory constraints", - }, - ) - } + maxReasonableAllocation := uint64(10 * 1024 * 1024 * 1024) + if memStats.Sys > 0 && memStats.Sys < maxReasonableAllocation*2 { + maxReasonableAllocation = memStats.Sys / 2 + } + if requiredBytes > maxReasonableAllocation { + return pkgerrors.WrapErrorWithContext( + fmt.Errorf("entry count %d would require %d bytes (%d GB), exceeding available system memory", entryCount, requiredBytes, requiredBytes/(1024*1024*1024)), + pkgerrors.ErrTypeValidation, "entry count exceeds available system memory", + pkgerrors.ValidationErrorContext{Field: "EntryCount", Value: entryCount, Expected: "value within available system memory constraints"}, + ) } + return nil +} - // Attempt allocation - system will naturally limit based on available memory +// Specification: package_file_format.md: 6 File Index Section +func (f *FileIndex) readFrom(r io.Reader) (int64, error) { + entryCount, reserved, firstEntryOffset, totalRead, err := readFileIndexHeader(r) + if err != nil { + return totalRead, err + } + f.EntryCount = entryCount + f.Reserved = reserved + f.FirstEntryOffset = firstEntryOffset + if err := validateEntryCountAllocation(entryCount, totalRead); err != nil { + return totalRead, err + } f.Entries = make([]IndexEntry, 0, entryCount) - for i := uint32(0); i < entryCount; i++ { var entry IndexEntry if err := binary.Read(r, binary.LittleEndian, &entry); err != nil { return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to read entry %d", i), pkgerrors.ValidationErrorContext{ - Field: "Entries", - Value: i, - Expected: "valid index entry", + Field: "Entries", Value: i, Expected: "valid index entry", }) } totalRead += IndexEntrySize f.Entries = append(f.Entries, entry) } - return totalRead, nil } @@ -300,7 +242,7 @@ func (f *FileIndex) ReadFrom(r io.Reader) (int64, error) { // Returns the number of bytes written and any error encountered. // // Specification: package_file_format.md: 6 File Index Section -func (f *FileIndex) WriteTo(w io.Writer) (int64, error) { +func (f *FileIndex) writeTo(w io.Writer) (int64, error) { var totalWritten int64 // Update EntryCount to match actual entries diff --git a/api/go/fileformat/fileindex_test.go b/api/go/fileformat/fileindex_test.go index 4b521768..ae8384b6 100644 --- a/api/go/fileformat/fileindex_test.go +++ b/api/go/fileformat/fileindex_test.go @@ -11,6 +11,34 @@ import ( "github.com/novus-engine/novuspack/api/go/pkgerrors" ) +// fileIndexWithManyEntries returns a FileIndex with 5 entries for use in tests. +func fileIndexWithManyEntries() FileIndex { + return FileIndex{ + EntryCount: 5, + Reserved: 0, + Entries: []IndexEntry{ + {FileID: 1, Offset: 112}, + {FileID: 2, Offset: 256}, + {FileID: 3, Offset: 512}, + {FileID: 4, Offset: 1024}, + {FileID: 5, Offset: 2048}, + }, + } +} + +// compareFileIndexEntries compares index.Entries with want.Entries and reports mismatches via t. +func compareFileIndexEntries(t *testing.T, index, want FileIndex) { + t.Helper() + for i, entry := range index.Entries { + if entry.FileID != want.Entries[i].FileID { + t.Errorf("Entry[%d].FileID = %d, want %d", i, entry.FileID, want.Entries[i].FileID) + } + if entry.Offset != want.Entries[i].Offset { + t.Errorf("Entry[%d].Offset = %d, want %d", i, entry.Offset, want.Entries[i].Offset) + } + } +} + // TestIndexEntrySize verifies IndexEntry is exactly 16 bytes func TestIndexEntrySize(t *testing.T) { var entry IndexEntry @@ -82,7 +110,7 @@ func TestFileIndexValidation(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - err := tt.index.Validate() + err := tt.index.validate() if (err != nil) != tt.wantErr { t.Errorf("Validate() error = %v, wantErr %v", err, tt.wantErr) } @@ -109,8 +137,8 @@ func TestFileIndexSizeCalculation(t *testing.T) { Entries: make([]IndexEntry, tt.entryCount), } - if index.Size() != tt.wantSize { - t.Errorf("Size() = %d, want %d", index.Size(), tt.wantSize) + if index.size() != tt.wantSize { + t.Errorf("size() = %d, want %d", index.size(), tt.wantSize) } }) } @@ -139,6 +167,8 @@ func TestNewFileIndex(t *testing.T) { // TestFileIndexReadFrom verifies ReadFrom deserialization // Specification: package_file_format.md: 6 File Index Section +// +//nolint:gocognit // table-driven test with multiple cases func TestFileIndexReadFrom(t *testing.T) { tests := []struct { name string @@ -168,17 +198,7 @@ func TestFileIndexReadFrom(t *testing.T) { }, { "Index with many entries", - FileIndex{ - EntryCount: 5, - Reserved: 0, - Entries: []IndexEntry{ - {FileID: 1, Offset: 112}, - {FileID: 2, Offset: 256}, - {FileID: 3, Offset: 512}, - {FileID: 4, Offset: 1024}, - {FileID: 5, Offset: 2048}, - }, - }, + fileIndexWithManyEntries(), false, }, } @@ -200,7 +220,7 @@ func TestFileIndexReadFrom(t *testing.T) { // Deserialize using ReadFrom var index FileIndex - n, err := index.ReadFrom(buf) + n, err := index.readFrom(buf) if (err != nil) != tt.wantErr { t.Errorf("ReadFrom() error = %v, wantErr %v", err, tt.wantErr) @@ -208,7 +228,7 @@ func TestFileIndexReadFrom(t *testing.T) { } if !tt.wantErr { - expectedSize := tt.index.Size() + expectedSize := tt.index.size() if n != int64(expectedSize) { t.Errorf("ReadFrom() read %d bytes, want %d", n, expectedSize) } @@ -225,17 +245,10 @@ func TestFileIndexReadFrom(t *testing.T) { } // Verify entries match - for i, entry := range index.Entries { - if entry.FileID != tt.index.Entries[i].FileID { - t.Errorf("Entry[%d].FileID = %d, want %d", i, entry.FileID, tt.index.Entries[i].FileID) - } - if entry.Offset != tt.index.Entries[i].Offset { - t.Errorf("Entry[%d].Offset = %d, want %d", i, entry.Offset, tt.index.Entries[i].Offset) - } - } + compareFileIndexEntries(t, index, tt.index) // Verify validation passes - if err := index.Validate(); err != nil { + if err := index.validate(); err != nil { t.Errorf("ReadFrom() index validation failed: %v", err) } } @@ -268,7 +281,7 @@ func TestFileIndexReadFromIncompleteData(t *testing.T) { }()}, {"Header with EntryCount>0 but no entries", func() []byte { buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(2)) // EntryCount = 2 + _ = binary.Write(buf, binary.LittleEndian, uint32(2)) _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // Reserved _ = binary.Write(buf, binary.LittleEndian, uint64(0)) // FirstEntryOffset // Only 16 bytes, but EntryCount says 2 entries needed (32 more bytes) @@ -276,7 +289,7 @@ func TestFileIndexReadFromIncompleteData(t *testing.T) { }()}, {"Header with EntryCount>0 but incomplete first entry", func() []byte { buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(2)) // EntryCount = 2 + _ = binary.Write(buf, binary.LittleEndian, uint32(2)) _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // Reserved _ = binary.Write(buf, binary.LittleEndian, uint64(0)) // FirstEntryOffset _ = binary.Write(buf, binary.LittleEndian, uint64(1)) // First entry FileID @@ -285,7 +298,7 @@ func TestFileIndexReadFromIncompleteData(t *testing.T) { }()}, {"Header with EntryCount>0 but incomplete second entry", func() []byte { buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(2)) // EntryCount = 2 + _ = binary.Write(buf, binary.LittleEndian, uint32(2)) _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // Reserved _ = binary.Write(buf, binary.LittleEndian, uint64(0)) // FirstEntryOffset _ = binary.Write(buf, binary.LittleEndian, uint64(1)) // First entry FileID @@ -300,7 +313,7 @@ func TestFileIndexReadFromIncompleteData(t *testing.T) { t.Run(tt.name, func(t *testing.T) { var index FileIndex r := bytes.NewReader(tt.data) - _, err := index.ReadFrom(r) + _, err := index.readFrom(r) if err == nil { t.Errorf("ReadFrom() expected error for incomplete data, got nil") @@ -311,6 +324,8 @@ func TestFileIndexReadFromIncompleteData(t *testing.T) { // TestFileIndexWriteTo verifies WriteTo serialization // Specification: package_file_format.md: 6 File Index Section +// +//nolint:gocognit // table-driven test func TestFileIndexWriteTo(t *testing.T) { tests := []struct { name string @@ -351,17 +366,7 @@ func TestFileIndexWriteTo(t *testing.T) { }, { "Index with many entries", - FileIndex{ - EntryCount: 5, - Reserved: 0, - Entries: []IndexEntry{ - {FileID: 1, Offset: 112}, - {FileID: 2, Offset: 256}, - {FileID: 3, Offset: 512}, - {FileID: 4, Offset: 1024}, - {FileID: 5, Offset: 2048}, - }, - }, + fileIndexWithManyEntries(), false, }, { @@ -446,7 +451,7 @@ func TestFileIndexWriteTo(t *testing.T) { tt.index.EntryCount = uint32(len(tt.index.Entries)) var buf bytes.Buffer - n, err := tt.index.WriteTo(&buf) + n, err := tt.index.writeTo(&buf) if (err != nil) != tt.wantErr { t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) @@ -454,7 +459,7 @@ func TestFileIndexWriteTo(t *testing.T) { } if !tt.wantErr { - expectedSize := tt.index.Size() + expectedSize := tt.index.size() if n != int64(expectedSize) { t.Errorf("WriteTo() wrote %d bytes, want %d", n, expectedSize) } @@ -465,7 +470,7 @@ func TestFileIndexWriteTo(t *testing.T) { // Verify we can read it back var index FileIndex - _, readErr := index.ReadFrom(&buf) + _, readErr := index.readFrom(&buf) if readErr != nil { t.Errorf("Failed to read back written data: %v", readErr) } @@ -479,6 +484,8 @@ func TestFileIndexWriteTo(t *testing.T) { } // TestFileIndexWriteToErrorPaths verifies WriteTo error handling +// +//nolint:gocognit // table-driven error paths func TestFileIndexWriteToErrorPaths(t *testing.T) { tests := []struct { name string @@ -591,7 +598,7 @@ func TestFileIndexWriteToErrorPaths(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { tt.index.EntryCount = uint32(len(tt.index.Entries)) - _, err := tt.index.WriteTo(tt.writer) + _, err := tt.index.writeTo(tt.writer) if (err != nil) != tt.wantErr { t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) @@ -611,6 +618,8 @@ func TestFileIndexWriteToErrorPaths(t *testing.T) { } // TestFileIndexRoundTrip verifies round-trip serialization +// +//nolint:gocognit // table-driven round-trip func TestFileIndexRoundTrip(t *testing.T) { tests := []struct { name string @@ -664,13 +673,13 @@ func TestFileIndexRoundTrip(t *testing.T) { // Write var buf bytes.Buffer - if _, err := tt.index.WriteTo(&buf); err != nil { + if _, err := tt.index.writeTo(&buf); err != nil { t.Fatalf("WriteTo() error = %v", err) } // Read var index FileIndex - if _, err := index.ReadFrom(&buf); err != nil { + if _, err := index.readFrom(&buf); err != nil { t.Fatalf("ReadFrom() error = %v", err) } @@ -686,17 +695,10 @@ func TestFileIndexRoundTrip(t *testing.T) { } // Compare entries - for i, entry := range index.Entries { - if entry.FileID != tt.index.Entries[i].FileID { - t.Errorf("Entry[%d].FileID mismatch: %d != %d", i, entry.FileID, tt.index.Entries[i].FileID) - } - if entry.Offset != tt.index.Entries[i].Offset { - t.Errorf("Entry[%d].Offset mismatch: %d != %d", i, entry.Offset, tt.index.Entries[i].Offset) - } - } + compareFileIndexEntries(t, index, tt.index) // Validate - if err := index.Validate(); err != nil { + if err := index.validate(); err != nil { t.Errorf("Round-trip index validation failed: %v", err) } @@ -778,68 +780,62 @@ func TestFileIndexReadFrom_OOMPrevention(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - // Skip tests that require very large allocations if they might cause issues if tt.entryCount > 1000000 && testing.Short() { t.Skip("Skipping large allocation test in short mode") } - header := createIndexHeader(tt.entryCount) r := bytes.NewReader(header) - var index FileIndex - _, err := index.ReadFrom(r) + _, err := index.readFrom(r) + assertOOMPreventionResult(t, err, tt.errorSubstr, tt.allowAnyValidationError) + }) + } +} - if tt.errorSubstr != "" { - // This test case should fail with a specific error - if err == nil { - t.Errorf("ReadFrom() expected error containing %q, got nil", tt.errorSubstr) - return - } +func assertOOMPreventionResult(t *testing.T, err error, errorSubstr string, allowAnyValidationError bool) { + t.Helper() + if errorSubstr != "" { + assertOOMPreventionErrorWithSubstr(t, err, errorSubstr) + return + } + if allowAnyValidationError { + assertOOMPreventionValidationError(t, err) + return + } + if err != nil { + assertOOMPreventionValidationErrorIfPresent(t, err) + } +} - errStr := err.Error() - if !strings.Contains(errStr, tt.errorSubstr) { - t.Errorf("ReadFrom() error = %q, want error containing %q", errStr, tt.errorSubstr) - } +func assertOOMPreventionErrorWithSubstr(t *testing.T, err error, substr string) { + t.Helper() + if err == nil { + t.Errorf("ReadFrom() expected error containing %q, got nil", substr) + return + } + if !strings.Contains(err.Error(), substr) { + t.Errorf("ReadFrom() error = %q, want error containing %q", err.Error(), substr) + } + assertOOMPreventionValidationError(t, err) +} - // Verify it's a validation error - var pkgErr *pkgerrors.PackageError - if !pkgerrors.As(err, &pkgErr) { - t.Errorf("ReadFrom() error is not a PackageError: %T", err) - return - } - if pkgErr.Type != pkgerrors.ErrTypeValidation { - t.Errorf("ReadFrom() error type = %v, want %v", pkgErr.Type, pkgerrors.ErrTypeValidation) - } - } else if tt.allowAnyValidationError { - // This test case should fail with any validation error (OOM prevention) - if err == nil { - t.Errorf("ReadFrom() expected validation error for OOM prevention, got nil") - return - } +func assertOOMPreventionValidationError(t *testing.T, err error) { + t.Helper() + var pkgErr *pkgerrors.PackageError + if !pkgerrors.As(err, &pkgErr) { + t.Errorf("ReadFrom() error is not a PackageError: %T", err) + return + } + if pkgErr.Type != pkgerrors.ErrTypeValidation { + t.Errorf("ReadFrom() error type = %v, want %v", pkgErr.Type, pkgerrors.ErrTypeValidation) + } +} - // Verify it's a validation error - var pkgErr *pkgerrors.PackageError - if !pkgerrors.As(err, &pkgErr) { - t.Errorf("ReadFrom() error is not a PackageError: %T", err) - return - } - if pkgErr.Type != pkgerrors.ErrTypeValidation { - t.Errorf("ReadFrom() error type = %v, want %v", pkgErr.Type, pkgerrors.ErrTypeValidation) - } - } else { - // This test case may pass or fail depending on system constraints - // Just verify it doesn't panic and handles the case gracefully - if err != nil { - // If it fails, it should be a validation error - var pkgErr *pkgerrors.PackageError - if pkgerrors.As(err, &pkgErr) { - if pkgErr.Type != pkgerrors.ErrTypeValidation { - t.Errorf("ReadFrom() error type = %v, want %v", pkgErr.Type, pkgerrors.ErrTypeValidation) - } - } - } - } - }) +func assertOOMPreventionValidationErrorIfPresent(t *testing.T, err error) { + t.Helper() + var pkgErr *pkgerrors.PackageError + if pkgerrors.As(err, &pkgErr) && pkgErr.Type != pkgerrors.ErrTypeValidation { + t.Errorf("ReadFrom() error type = %v, want %v", pkgErr.Type, pkgerrors.ErrTypeValidation) } } @@ -857,7 +853,7 @@ func TestFileIndexReadFrom_OOMPrevention_MaxIntBoundary(t *testing.T) { _ = binary.Write(buf, binary.LittleEndian, uint64(0)) // FirstEntryOffset var index FileIndex - _, err := index.ReadFrom(buf) + _, err := index.readFrom(buf) // maxUint32 should trigger one of the validation errors // (likely multiplication overflow or maxInt/int(IndexEntrySize) check) @@ -891,7 +887,7 @@ func TestFileIndexReadFrom_OOMPrevention_MultiplicationOverflow(t *testing.T) { _ = binary.Write(buf, binary.LittleEndian, uint64(0)) // FirstEntryOffset var index FileIndex - _, err := index.ReadFrom(buf) + _, err := index.readFrom(buf) if err == nil { t.Error("ReadFrom() expected error for multiplication overflow, got nil") @@ -900,7 +896,8 @@ func TestFileIndexReadFrom_OOMPrevention_MultiplicationOverflow(t *testing.T) { // Check if it's the multiplication overflow error errStr := err.Error() - if strings.Contains(errStr, "calculation overflow") { + switch { + case strings.Contains(errStr, "calculation overflow"): // This is the correct error path var pkgErr *pkgerrors.PackageError if !pkgerrors.As(err, &pkgErr) { @@ -910,10 +907,10 @@ func TestFileIndexReadFrom_OOMPrevention_MultiplicationOverflow(t *testing.T) { if pkgErr.Type != pkgerrors.ErrTypeValidation { t.Errorf("ReadFrom() error type = %v, want %v", pkgErr.Type, pkgerrors.ErrTypeValidation) } - } else if strings.Contains(errStr, "exceeds maximum allocation size") { + case strings.Contains(errStr, "exceeds maximum allocation size"): // This is also acceptable - it means we hit the maxInt/int(IndexEntrySize) check first // Both are valid OOM prevention paths - } else { + default: // Any validation error is acceptable for OOM prevention var pkgErr *pkgerrors.PackageError if pkgerrors.As(err, &pkgErr) && pkgErr.Type == pkgerrors.ErrTypeValidation { diff --git a/api/go/fileformat/header.go b/api/go/fileformat/header.go index ada9f4d1..e5fbf5e9 100644 --- a/api/go/fileformat/header.go +++ b/api/go/fileformat/header.go @@ -129,7 +129,7 @@ type PackageHeader struct { // Returns an error if any validation check fails. // // Specification: package_file_format.md: 2.1 Header Structure -func (h *PackageHeader) Validate() error { +func (h *PackageHeader) validate() error { // Validate magic number if h.Magic != NVPKMagic { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "invalid magic number", nil, pkgerrors.ValidationErrorContext{ @@ -169,7 +169,7 @@ func (h *PackageHeader) Validate() error { // - 3: LZMA compression // // Specification: package_file_format.md: 2.5 Package Features Flags -func (h *PackageHeader) GetCompressionType() uint8 { +func (h *PackageHeader) getCompressionType() uint8 { return uint8((h.Flags & FlagsMaskCompressionType) >> FlagsShiftCompressionType) } @@ -179,7 +179,7 @@ func (h *PackageHeader) GetCompressionType() uint8 { // Preserves existing feature flags (bits 0-7). // // Specification: package_file_format.md: 2.5 Package Features Flags -func (h *PackageHeader) SetCompressionType(compressionType uint8) { +func (h *PackageHeader) setCompressionType(compressionType uint8) { // Clear compression type bits h.Flags &= ^uint32(FlagsMaskCompressionType) // Set new compression type @@ -191,28 +191,28 @@ func (h *PackageHeader) SetCompressionType(compressionType uint8) { // Returns the feature flags (bits 0-7) as a bitmask. // // Specification: package_file_format.md: 2.5 Package Features Flags -func (h *PackageHeader) GetFeatures() uint8 { +func (h *PackageHeader) getFeatures() uint8 { return uint8(h.Flags & FlagsMaskFeatures) } // HasFeature checks if a specific feature flag is set. // // Specification: package_file_format.md: 2.5 Package Features Flags -func (h *PackageHeader) HasFeature(flag uint32) bool { +func (h *PackageHeader) hasFeature(flag uint32) bool { return (h.Flags & flag) != 0 } // SetFeature sets a specific feature flag. // // Specification: package_file_format.md: 2.5 Package Features Flags -func (h *PackageHeader) SetFeature(flag uint32) { +func (h *PackageHeader) setFeature(flag uint32) { h.Flags |= flag } // ClearFeature clears a specific feature flag. // // Specification: package_file_format.md: 2.5 Package Features Flags -func (h *PackageHeader) ClearFeature(flag uint32) { +func (h *PackageHeader) clearFeature(flag uint32) { h.Flags &= ^flag } @@ -221,7 +221,7 @@ func (h *PackageHeader) ClearFeature(flag uint32) { // Returns the part number (bits 31-16). // // Specification: package_file_format.md: 2.6 ArchivePartInfo Field Specification -func (h *PackageHeader) GetArchivePart() uint16 { +func (h *PackageHeader) getArchivePart() uint16 { return uint16(h.ArchivePartInfo >> 16) } @@ -230,28 +230,28 @@ func (h *PackageHeader) GetArchivePart() uint16 { // Returns the total parts (bits 15-0). // // Specification: package_file_format.md: 2.6 ArchivePartInfo Field Specification -func (h *PackageHeader) GetArchiveTotal() uint16 { +func (h *PackageHeader) getArchiveTotal() uint16 { return uint16(h.ArchivePartInfo & 0xFFFF) } // SetArchivePartInfo sets both part number and total parts in ArchivePartInfo. // // Specification: package_file_format.md: 2.6 ArchivePartInfo Field Specification -func (h *PackageHeader) SetArchivePartInfo(part, total uint16) { +func (h *PackageHeader) setArchivePartInfo(part, total uint16) { h.ArchivePartInfo = (uint32(part) << 16) | uint32(total) } // IsSigned returns true if the package has signatures. // // Specification: package_file_format.md: 2.9 Signed Package File Immutability and Incremental Signatures -func (h *PackageHeader) IsSigned() bool { +func (h *PackageHeader) isSigned() bool { return h.SignatureOffset > 0 } // HasComment returns true if the package has a comment. // // Specification: package_file_format.md: 7.1 Package Comment Format Specification -func (h *PackageHeader) HasComment() bool { +func (h *PackageHeader) hasComment() bool { return h.CommentSize > 0 } @@ -306,7 +306,7 @@ func NewPackageHeader() *PackageHeader { // If the magic number is invalid, returns a validation error. // // Specification: package_file_format.md: 2.1 Header Structure -func (h *PackageHeader) ReadFrom(r io.Reader) (int64, error) { +func (h *PackageHeader) readFrom(r io.Reader) (int64, error) { var totalRead int64 // Read all fields in order using binary.Read for proper little-endian handling @@ -347,7 +347,7 @@ func (h *PackageHeader) ReadFrom(r io.Reader) (int64, error) { // Returns the number of bytes written and any error encountered. // // Specification: package_file_format.md: 2.1 Header Structure -func (h *PackageHeader) WriteTo(w io.Writer) (int64, error) { +func (h *PackageHeader) writeTo(w io.Writer) (int64, error) { var totalWritten int64 // Write all fields in order using binary.Write for proper little-endian handling diff --git a/api/go/fileformat/header_test.go b/api/go/fileformat/header_test.go index e6eeeac2..493d31b4 100644 --- a/api/go/fileformat/header_test.go +++ b/api/go/fileformat/header_test.go @@ -10,21 +10,9 @@ import ( "github.com/novus-engine/novuspack/api/go/internal/testhelpers" ) -// TestPackageHeaderSize verifies the PackageHeader struct is exactly 112 bytes -// Specification: package_file_format.md: 2.1 Header Structure -func TestPackageHeaderSize(t *testing.T) { - var header PackageHeader - size := binary.Size(header) - - if size != PackageHeaderSize { - t.Errorf("PackageHeader size = %d bytes, want %d bytes", size, PackageHeaderSize) - } -} - -// TestPackageHeaderFieldTypes verifies all fields have correct types -// Specification: package_file_format.md: 2.1 Header Structure -func TestPackageHeaderFieldTypes(t *testing.T) { - header := PackageHeader{ +// packageHeaderMinimal returns a PackageHeader with minimal/default field values (Flags 0, ArchivePartInfo Part 1 of 1). +func packageHeaderMinimal() PackageHeader { + return PackageHeader{ Magic: NVPKMagic, FormatVersion: FormatVersion, Flags: 0, @@ -46,27 +34,18 @@ func TestPackageHeaderFieldTypes(t *testing.T) { CommentStart: 0, SignatureOffset: 0, } +} - // Verify Magic is uint32 - if header.Magic != NVPKMagic { - t.Errorf("Magic = 0x%X, want 0x%X", header.Magic, NVPKMagic) - } - - // Verify FormatVersion is uint32 - if header.FormatVersion != FormatVersion { - t.Errorf("FormatVersion = %d, want %d", header.FormatVersion, FormatVersion) - } - - // Verify ArchivePartInfo default value - if header.ArchivePartInfo != 0x00010001 { - t.Errorf("ArchivePartInfo = 0x%X, want 0x00010001", header.ArchivePartInfo) - } +// packageHeaderMinimalZeroPart returns a PackageHeader with minimal fields and ArchivePartInfo 0. +func packageHeaderMinimalZeroPart() PackageHeader { + h := packageHeaderMinimal() + h.ArchivePartInfo = 0 + return h } -// TestPackageHeaderSerialization verifies binary serialization/deserialization -// Specification: package_file_format.md: 2.1 Header Structure -func TestPackageHeaderSerialization(t *testing.T) { - original := PackageHeader{ +// packageHeaderForSerialization returns a full PackageHeader used in serialization/ReadFrom tests. +func packageHeaderForSerialization() PackageHeader { + return PackageHeader{ Magic: NVPKMagic, FormatVersion: FormatVersion, Flags: FlagHasSignatures | FlagHasCompressedFiles, @@ -88,6 +67,98 @@ func TestPackageHeaderSerialization(t *testing.T) { CommentStart: 0, SignatureOffset: 0, } +} + +// packageHeaderWithAllFieldsSet returns a PackageHeader with all fields set (for WriteTo/ReadFrom tests). +func packageHeaderWithAllFieldsSet() PackageHeader { + return packageHeaderFull(42, 17, 0xDEADBEEF, 0x00020003, 100, 0x0411) +} + +// packageHeaderRoundTripFull returns a full PackageHeader for round-trip tests (different metadata values). +func packageHeaderRoundTripFull() PackageHeader { + return packageHeaderFull(100, 50, 0xABCDEF00, 0x0005000A, 200, 0x0409) +} + +func packageHeaderFull(pkgDataVer, metaVer, crc, partInfo, commentSize, localeID uint32) PackageHeader { + return PackageHeader{ + Magic: NVPKMagic, + FormatVersion: FormatVersion, + Flags: 0x01FF, + PackageDataVersion: pkgDataVer, + MetadataVersion: metaVer, + PackageCRC: crc, + CreatedTime: 1638360000000000000, + ModifiedTime: 1638361000000000000, + LocaleID: localeID, + Reserved: 0, + AppID: 730, + VendorID: VendorIDSteam, + CreatorID: 0, + IndexStart: 8192, + IndexSize: 2048, + ArchiveChainID: 0x123456789ABCDEF0, + ArchivePartInfo: partInfo, + CommentSize: commentSize, + CommentStart: 6144, + SignatureOffset: 10240, + } +} + +// headerGetterTestCase is a single test case for a getter on PackageHeader. +type headerGetterTestCase[T comparable] struct { + name string + header PackageHeader + want T +} + +// runHeaderGetterTests runs table-driven tests for a header getter; T must be comparable. +func runHeaderGetterTests[T comparable](t *testing.T, tests []headerGetterTestCase[T], getter func(PackageHeader) T, format string) { + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got := getter(tt.header) + if got != tt.want { + t.Errorf(format, got, tt.want) + } + }) + } +} + +// TestPackageHeaderSize verifies the PackageHeader struct is exactly 112 bytes +// Specification: package_file_format.md: 2.1 Header Structure +func TestPackageHeaderSize(t *testing.T) { + var header PackageHeader + size := binary.Size(header) + + if size != PackageHeaderSize { + t.Errorf("PackageHeader size = %d bytes, want %d bytes", size, PackageHeaderSize) + } +} + +// TestPackageHeaderFieldTypes verifies all fields have correct types +// Specification: package_file_format.md: 2.1 Header Structure +func TestPackageHeaderFieldTypes(t *testing.T) { + header := packageHeaderMinimal() + + // Verify Magic is uint32 + if header.Magic != NVPKMagic { + t.Errorf("Magic = 0x%X, want 0x%X", header.Magic, NVPKMagic) + } + + // Verify FormatVersion is uint32 + if header.FormatVersion != FormatVersion { + t.Errorf("FormatVersion = %d, want %d", header.FormatVersion, FormatVersion) + } + + // Verify ArchivePartInfo default value + if header.ArchivePartInfo != 0x00010001 { + t.Errorf("ArchivePartInfo = 0x%X, want 0x00010001", header.ArchivePartInfo) + } +} + +// TestPackageHeaderSerialization verifies binary serialization/deserialization +// Specification: package_file_format.md: 2.1 Header Structure +func TestPackageHeaderSerialization(t *testing.T) { + original := packageHeaderForSerialization() // Serialize to bytes buf := new(bytes.Buffer) @@ -136,7 +207,7 @@ func TestPackageHeaderMagicValidation(t *testing.T) { Magic: tt.magic, FormatVersion: tt.version, } - err := header.Validate() + err := header.validate() if tt.wantErr && err == nil { t.Error("Validate() expected error, got nil") @@ -157,7 +228,7 @@ func TestPackageHeaderReservedFieldValidation(t *testing.T) { Reserved: 1, // Non-zero reserved field } - err := header.Validate() + err := header.validate() if err == nil { t.Error("Validate() expected error for non-zero reserved field, got nil") } @@ -183,7 +254,7 @@ func TestPackageHeaderFormatVersionValidation(t *testing.T) { Magic: NVPKMagic, FormatVersion: tt.formatVersion, } - err := header.Validate() + err := header.validate() if (err != nil) != tt.wantErr { t.Errorf("Validate() error = %v, wantErr %v", err, tt.wantErr) @@ -292,28 +363,15 @@ func TestPackageHeaderVersionFields(t *testing.T) { // TestPackageHeaderGetCompressionType verifies GetCompressionType extraction // Specification: package_file_format.md: 2.5 Package Features Flags func TestPackageHeaderGetCompressionType(t *testing.T) { - tests := []struct { - name string - flags uint32 - wantCompression uint8 - }{ - {"No compression", 0x0000, CompressionNone}, - {"Zstd compression", 0x0100, CompressionZstd}, - {"LZ4 compression", 0x0200, CompressionLZ4}, - {"LZMA compression", 0x0300, CompressionLZMA}, - {"Zstd with features", 0x01FF, CompressionZstd}, - {"LZ4 with features", 0x02FF, CompressionLZ4}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - header := PackageHeader{Flags: tt.flags} - got := header.GetCompressionType() - if got != tt.wantCompression { - t.Errorf("GetCompressionType() = %d, want %d", got, tt.wantCompression) - } - }) - } + tests := []headerGetterTestCase[uint8]{ + {"No compression", PackageHeader{Flags: 0x0000}, CompressionNone}, + {"Zstd compression", PackageHeader{Flags: 0x0100}, CompressionZstd}, + {"LZ4 compression", PackageHeader{Flags: 0x0200}, CompressionLZ4}, + {"LZMA compression", PackageHeader{Flags: 0x0300}, CompressionLZMA}, + {"Zstd with features", PackageHeader{Flags: 0x01FF}, CompressionZstd}, + {"LZ4 with features", PackageHeader{Flags: 0x02FF}, CompressionLZ4}, + } + runHeaderGetterTests(t, tests, func(h PackageHeader) uint8 { return h.getCompressionType() }, "getCompressionType() = %d, want %d") } // TestPackageHeaderSetCompressionType verifies SetCompressionType preserves features @@ -338,10 +396,10 @@ func TestPackageHeaderSetCompressionType(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { header := PackageHeader{Flags: tt.initialFlags} - header.SetCompressionType(tt.compressionType) + header.setCompressionType(tt.compressionType) - gotCompression := header.GetCompressionType() - gotFeatures := header.GetFeatures() + gotCompression := header.getCompressionType() + gotFeatures := header.getFeatures() if gotCompression != tt.wantCompression { t.Errorf("Compression type = %d, want %d", gotCompression, tt.wantCompression) @@ -356,28 +414,15 @@ func TestPackageHeaderSetCompressionType(t *testing.T) { // TestPackageHeaderGetFeatures verifies GetFeatures extraction // Specification: package_file_format.md: 2.5 Package Features Flags func TestPackageHeaderGetFeatures(t *testing.T) { - tests := []struct { - name string - flags uint32 - wantFeatures uint8 - }{ - {"No features", 0x0000, 0x00}, - {"Has signatures", FlagHasSignatures, 0x01}, - {"Has compressed files", FlagHasCompressedFiles, 0x02}, - {"Has encrypted files", FlagHasEncryptedFiles, 0x04}, - {"All features", 0x00FF, 0xFF}, - {"Features with compression", 0x01FF, 0xFF}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - header := PackageHeader{Flags: tt.flags} - got := header.GetFeatures() - if got != tt.wantFeatures { - t.Errorf("GetFeatures() = 0x%02X, want 0x%02X", got, tt.wantFeatures) - } - }) - } + tests := []headerGetterTestCase[uint8]{ + {"No features", PackageHeader{Flags: 0x0000}, 0x00}, + {"Has signatures", PackageHeader{Flags: FlagHasSignatures}, 0x01}, + {"Has compressed files", PackageHeader{Flags: FlagHasCompressedFiles}, 0x02}, + {"Has encrypted files", PackageHeader{Flags: FlagHasEncryptedFiles}, 0x04}, + {"All features", PackageHeader{Flags: 0x00FF}, 0xFF}, + {"Features with compression", PackageHeader{Flags: 0x01FF}, 0xFF}, + } + runHeaderGetterTests(t, tests, func(h PackageHeader) uint8 { return h.getFeatures() }, "getFeatures() = 0x%02X, want 0x%02X") } // TestPackageHeaderHasFeature verifies HasFeature checking @@ -399,7 +444,7 @@ func TestPackageHeaderHasFeature(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - got := header.HasFeature(tt.flag) + got := header.hasFeature(tt.flag) if got != tt.want { t.Errorf("HasFeature(0x%08X) = %v, want %v", tt.flag, got, tt.want) } @@ -413,26 +458,26 @@ func TestPackageHeaderSetFeature(t *testing.T) { header := PackageHeader{Flags: 0x0000} // Set first feature - header.SetFeature(FlagHasSignatures) - if !header.HasFeature(FlagHasSignatures) { + header.setFeature(FlagHasSignatures) + if !header.hasFeature(FlagHasSignatures) { t.Error("SetFeature(FlagHasSignatures) did not set the flag") } // Set additional feature - header.SetFeature(FlagHasCompressedFiles) - if !header.HasFeature(FlagHasSignatures) { + header.setFeature(FlagHasCompressedFiles) + if !header.hasFeature(FlagHasSignatures) { t.Error("SetFeature(FlagHasCompressedFiles) cleared existing flag") } - if !header.HasFeature(FlagHasCompressedFiles) { + if !header.hasFeature(FlagHasCompressedFiles) { t.Error("SetFeature(FlagHasCompressedFiles) did not set the flag") } // Set multiple features - header.SetFeature(FlagHasEncryptedFiles | FlagHasPackageComment) - if !header.HasFeature(FlagHasEncryptedFiles) { + header.setFeature(FlagHasEncryptedFiles | FlagHasPackageComment) + if !header.hasFeature(FlagHasEncryptedFiles) { t.Error("SetFeature did not set FlagHasEncryptedFiles") } - if !header.HasFeature(FlagHasPackageComment) { + if !header.hasFeature(FlagHasPackageComment) { t.Error("SetFeature did not set FlagHasPackageComment") } } @@ -443,74 +488,49 @@ func TestPackageHeaderClearFeature(t *testing.T) { header := PackageHeader{Flags: FlagHasSignatures | FlagHasCompressedFiles | FlagHasEncryptedFiles} // Clear one feature - header.ClearFeature(FlagHasSignatures) - if header.HasFeature(FlagHasSignatures) { + header.clearFeature(FlagHasSignatures) + if header.hasFeature(FlagHasSignatures) { t.Error("ClearFeature(FlagHasSignatures) did not clear the flag") } - if !header.HasFeature(FlagHasCompressedFiles) { + if !header.hasFeature(FlagHasCompressedFiles) { t.Error("ClearFeature(FlagHasSignatures) cleared wrong flag") } - if !header.HasFeature(FlagHasEncryptedFiles) { + if !header.hasFeature(FlagHasEncryptedFiles) { t.Error("ClearFeature(FlagHasSignatures) cleared wrong flag") } // Clear multiple features - header.ClearFeature(FlagHasCompressedFiles | FlagHasEncryptedFiles) - if header.HasFeature(FlagHasCompressedFiles) { + header.clearFeature(FlagHasCompressedFiles | FlagHasEncryptedFiles) + if header.hasFeature(FlagHasCompressedFiles) { t.Error("ClearFeature did not clear FlagHasCompressedFiles") } - if header.HasFeature(FlagHasEncryptedFiles) { + if header.hasFeature(FlagHasEncryptedFiles) { t.Error("ClearFeature did not clear FlagHasEncryptedFiles") } } -// TestPackageHeaderGetArchivePart verifies GetArchivePart extraction -// Specification: package_file_format.md: 2.6 ArchivePartInfo Field Specification -func TestPackageHeaderGetArchivePart(t *testing.T) { - tests := []struct { - name string - partInfo uint32 - wantPart uint16 - }{ - {"Part 1 of 1", 0x00010001, 1}, - {"Part 2 of 3", 0x00020003, 2}, - {"Part 0 of 0", 0x00000000, 0}, - {"Part 65535 of 65535", 0xFFFFFFFF, 65535}, - {"Part 10 of 20", 0x000A0014, 10}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - header := PackageHeader{ArchivePartInfo: tt.partInfo} - got := header.GetArchivePart() - if got != tt.wantPart { - t.Errorf("GetArchivePart() = %d, want %d", got, tt.wantPart) - } - }) - } -} - -// TestPackageHeaderGetArchiveTotal verifies GetArchiveTotal extraction +// TestPackageHeaderArchivePartInfoGetters verifies GetArchivePart and GetArchiveTotal extraction. // Specification: package_file_format.md: 2.6 ArchivePartInfo Field Specification -func TestPackageHeaderGetArchiveTotal(t *testing.T) { +func TestPackageHeaderArchivePartInfoGetters(t *testing.T) { tests := []struct { name string - partInfo uint32 + header PackageHeader + wantPart uint16 wantTotal uint16 }{ - {"Part 1 of 1", 0x00010001, 1}, - {"Part 2 of 3", 0x00020003, 3}, - {"Part 0 of 0", 0x00000000, 0}, - {"Part 65535 of 65535", 0xFFFFFFFF, 65535}, - {"Part 10 of 20", 0x000A0014, 20}, + {"Part 1 of 1", PackageHeader{ArchivePartInfo: 0x00010001}, 1, 1}, + {"Part 2 of 3", PackageHeader{ArchivePartInfo: 0x00020003}, 2, 3}, + {"Part 0 of 0", PackageHeader{ArchivePartInfo: 0x00000000}, 0, 0}, + {"Part 65535 of 65535", PackageHeader{ArchivePartInfo: 0xFFFFFFFF}, 65535, 65535}, + {"Part 10 of 20", PackageHeader{ArchivePartInfo: 0x000A0014}, 10, 20}, } - for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - header := PackageHeader{ArchivePartInfo: tt.partInfo} - got := header.GetArchiveTotal() - if got != tt.wantTotal { - t.Errorf("GetArchiveTotal() = %d, want %d", got, tt.wantTotal) + if got := tt.header.getArchivePart(); got != tt.wantPart { + t.Errorf("getArchivePart() = %d, want %d", got, tt.wantPart) + } + if got := tt.header.getArchiveTotal(); got != tt.wantTotal { + t.Errorf("getArchiveTotal() = %d, want %d", got, tt.wantTotal) } }) } @@ -536,10 +556,10 @@ func TestPackageHeaderSetArchivePartInfo(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { header := PackageHeader{} - header.SetArchivePartInfo(tt.part, tt.total) + header.setArchivePartInfo(tt.part, tt.total) - gotPart := header.GetArchivePart() - gotTotal := header.GetArchiveTotal() + gotPart := header.getArchivePart() + gotTotal := header.getArchiveTotal() if gotPart != tt.wantPart { t.Errorf("Part = %d, want %d", gotPart, tt.wantPart) @@ -551,54 +571,36 @@ func TestPackageHeaderSetArchivePartInfo(t *testing.T) { } } -// TestPackageHeaderIsSigned verifies IsSigned checking -// Specification: package_file_format.md: 2.9 Signed Package File Immutability and Incremental Signatures -func TestPackageHeaderIsSigned(t *testing.T) { - tests := []struct { - name string - signatureOffset uint64 - want bool - }{ - {"Not signed", 0, false}, - {"Signed", 4096, true}, - {"Signed at offset 1", 1, true}, - {"Signed at large offset", 0xFFFFFFFFFFFFFFFF, true}, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - header := PackageHeader{SignatureOffset: tt.signatureOffset} - got := header.IsSigned() - if got != tt.want { - t.Errorf("IsSigned() = %v, want %v", got, tt.want) - } - }) +// isSignedGetterTestCases returns test cases for IsSigned(). +func isSignedGetterTestCases() []headerGetterTestCase[bool] { + return []headerGetterTestCase[bool]{ + {"Not signed", PackageHeader{SignatureOffset: 0}, false}, + {"Signed", PackageHeader{SignatureOffset: 4096}, true}, + {"Signed at offset 1", PackageHeader{SignatureOffset: 1}, true}, + {"Signed at large offset", PackageHeader{SignatureOffset: 0xFFFFFFFFFFFFFFFF}, true}, } } -// TestPackageHeaderHasComment verifies HasComment checking -// Specification: package_file_format.md: 7.1 Package Comment Format Specification -func TestPackageHeaderHasComment(t *testing.T) { - tests := []struct { - name string - commentSize uint32 - want bool - }{ - {"No comment", 0, false}, - {"Has comment", 100, true}, - {"Has comment size 1", 1, true}, - {"Has large comment", 0xFFFFFFFF, true}, +// hasCommentGetterTestCases returns test cases for HasComment(). +func hasCommentGetterTestCases() []headerGetterTestCase[bool] { + return []headerGetterTestCase[bool]{ + {"No comment", PackageHeader{CommentSize: 0}, false}, + {"Has comment", PackageHeader{CommentSize: 100}, true}, + {"Has comment size 1", PackageHeader{CommentSize: 1}, true}, + {"Has large comment", PackageHeader{CommentSize: 0xFFFFFFFF}, true}, } +} - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - header := PackageHeader{CommentSize: tt.commentSize} - got := header.HasComment() - if got != tt.want { - t.Errorf("HasComment() = %v, want %v", got, tt.want) - } - }) - } +// TestPackageHeaderIsSignedAndHasComment verifies IsSigned and HasComment checking. +// Specification: package_file_format.md: 2.9 2.9 Signed Package File Immutability and Incremental Signatures +// Specification: package_file_format.md: 7.1 7.1 Package Comment Format Specification +func TestPackageHeaderIsSignedAndHasComment(t *testing.T) { + t.Run("IsSigned", func(t *testing.T) { + runHeaderGetterTests(t, isSignedGetterTestCases(), func(h PackageHeader) bool { return h.isSigned() }, "isSigned() = %v, want %v") + }) + t.Run("HasComment", func(t *testing.T) { + runHeaderGetterTests(t, hasCommentGetterTestCases(), func(h PackageHeader) bool { return h.hasComment() }, "hasComment() = %v, want %v") + }) } // TestNewPackageHeader verifies NewPackageHeader initializes correctly @@ -645,97 +647,24 @@ func TestNewPackageHeader(t *testing.T) { } // Verify it passes validation - if err := header.Validate(); err != nil { + if err := header.validate(); err != nil { t.Errorf("Validate() error = %v, want nil", err) } } // TestPackageHeaderReadFrom verifies ReadFrom deserialization // Specification: package_file_format.md: 2.1 Header Structure +// +//nolint:gocognit // table-driven test func TestPackageHeaderReadFrom(t *testing.T) { tests := []struct { name string header PackageHeader wantErr bool }{ - { - "Valid header", - PackageHeader{ - Magic: NVPKMagic, - FormatVersion: FormatVersion, - Flags: FlagHasSignatures | FlagHasCompressedFiles, - PackageDataVersion: 1, - MetadataVersion: 1, - PackageCRC: 0x12345678, - CreatedTime: 1638360000000000000, - ModifiedTime: 1638360000000000000, - LocaleID: 0x0409, - Reserved: 0, - AppID: 730, - VendorID: VendorIDSteam, - CreatorID: 0, - IndexStart: 4096, - IndexSize: 1024, - ArchiveChainID: 0, - ArchivePartInfo: 0x00010001, - CommentSize: 0, - CommentStart: 0, - SignatureOffset: 0, - }, - false, - }, - { - "Header with all fields set", - PackageHeader{ - Magic: NVPKMagic, - FormatVersion: FormatVersion, - Flags: 0x01FF, - PackageDataVersion: 42, - MetadataVersion: 17, - PackageCRC: 0xDEADBEEF, - CreatedTime: 1638360000000000000, - ModifiedTime: 1638361000000000000, - LocaleID: 0x0411, - Reserved: 0, - AppID: 730, - VendorID: VendorIDSteam, - CreatorID: 0, - IndexStart: 8192, - IndexSize: 2048, - ArchiveChainID: 0x123456789ABCDEF0, - ArchivePartInfo: 0x00020003, - CommentSize: 100, - CommentStart: 6144, - SignatureOffset: 10240, - }, - false, - }, - { - "Header with minimal fields", - PackageHeader{ - Magic: NVPKMagic, - FormatVersion: FormatVersion, - Flags: 0, - PackageDataVersion: 1, - MetadataVersion: 1, - PackageCRC: 0, - CreatedTime: 0, - ModifiedTime: 0, - LocaleID: 0, - Reserved: 0, - AppID: 0, - VendorID: 0, - CreatorID: 0, - IndexStart: 0, - IndexSize: 0, - ArchiveChainID: 0, - ArchivePartInfo: 0, - CommentSize: 0, - CommentStart: 0, - SignatureOffset: 0, - }, - false, - }, + {"Valid header", packageHeaderForSerialization(), false}, + {"Header with all fields set", packageHeaderWithAllFieldsSet(), false}, + {"Header with minimal fields", packageHeaderMinimalZeroPart(), false}, } for _, tt := range tests { @@ -754,7 +683,7 @@ func TestPackageHeaderReadFrom(t *testing.T) { // Deserialize using ReadFrom var header PackageHeader - n, err := header.ReadFrom(buf) + n, err := header.readFrom(buf) if (err != nil) != tt.wantErr { t.Errorf("ReadFrom() error = %v, wantErr %v", err, tt.wantErr) @@ -774,7 +703,7 @@ func TestPackageHeaderReadFrom(t *testing.T) { } // Verify validation passes - if err := header.Validate(); err != nil { + if err := header.validate(); err != nil { t.Errorf("ReadFrom() header validation failed: %v", err) } } @@ -800,7 +729,7 @@ func TestPackageHeaderReadFromInvalidMagic(t *testing.T) { // Try to read var header PackageHeader - _, err = header.ReadFrom(buf) + _, err = header.readFrom(buf) if err == nil { t.Error("ReadFrom() expected error for invalid magic, got nil") } else if !strings.Contains(err.Error(), "magic") { @@ -844,7 +773,7 @@ func TestPackageHeaderReadFromIncompleteData(t *testing.T) { t.Run(tt.name, func(t *testing.T) { var header PackageHeader r := bytes.NewReader(tt.data) - _, err := header.ReadFrom(r) + _, err := header.readFrom(r) // Check if this is a valid case (complete header with invalid magic) isInvalidMagicCase := strings.Contains(tt.name, "Complete header with invalid magic") @@ -867,19 +796,22 @@ func TestPackageHeaderReadFromIncompleteData(t *testing.T) { func TestPackageHeaderReadFromNonEOFError(t *testing.T) { var header PackageHeader r := testhelpers.NewErrorReader() - _, err := header.ReadFrom(r) + _, err := header.readFrom(r) - if err == nil { + switch { + case err == nil: t.Error("ReadFrom() expected error for error reader, got nil") - } else if strings.Contains(err.Error(), "EOF") || strings.Contains(err.Error(), "incomplete") { + case strings.Contains(err.Error(), "EOF") || strings.Contains(err.Error(), "incomplete"): t.Errorf("ReadFrom() error = %q, want non-EOF error", err.Error()) - } else if !strings.Contains(err.Error(), "failed to read header") { + case !strings.Contains(err.Error(), "failed to read header"): t.Errorf("ReadFrom() error = %q, want error containing 'failed to read header'", err.Error()) } } // TestPackageHeaderWriteTo verifies WriteTo serialization // Specification: package_file_format.md: 2.1 Header Structure +// +//nolint:gocognit // table-driven test func TestPackageHeaderWriteTo(t *testing.T) { tests := []struct { name string @@ -899,38 +831,13 @@ func TestPackageHeaderWriteTo(t *testing.T) { }, false, }, - { - "Header with all fields", - PackageHeader{ - Magic: NVPKMagic, - FormatVersion: FormatVersion, - Flags: 0x01FF, - PackageDataVersion: 42, - MetadataVersion: 17, - PackageCRC: 0xDEADBEEF, - CreatedTime: 1638360000000000000, - ModifiedTime: 1638361000000000000, - LocaleID: 0x0411, - Reserved: 0, - AppID: 730, - VendorID: VendorIDSteam, - CreatorID: 0, - IndexStart: 8192, - IndexSize: 2048, - ArchiveChainID: 0x123456789ABCDEF0, - ArchivePartInfo: 0x00020003, - CommentSize: 100, - CommentStart: 6144, - SignatureOffset: 10240, - }, - false, - }, + {"Header with all fields", packageHeaderWithAllFieldsSet(), false}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { var buf bytes.Buffer - n, err := tt.header.WriteTo(&buf) + n, err := tt.header.writeTo(&buf) if (err != nil) != tt.wantErr { t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) @@ -948,7 +855,7 @@ func TestPackageHeaderWriteTo(t *testing.T) { // Verify we can read it back var header PackageHeader - _, readErr := header.ReadFrom(&buf) + _, readErr := header.readFrom(&buf) if readErr != nil { t.Errorf("Failed to read back written data: %v", readErr) } @@ -980,44 +887,20 @@ func TestPackageHeaderRoundTrip(t *testing.T) { ArchivePartInfo: 0x00010001, }, }, - { - "Full header", - PackageHeader{ - Magic: NVPKMagic, - FormatVersion: FormatVersion, - Flags: 0x01FF, - PackageDataVersion: 100, - MetadataVersion: 50, - PackageCRC: 0xABCDEF00, - CreatedTime: 1638360000000000000, - ModifiedTime: 1638361000000000000, - LocaleID: 0x0409, - Reserved: 0, - AppID: 730, - VendorID: VendorIDSteam, - CreatorID: 0, - IndexStart: 8192, - IndexSize: 2048, - ArchiveChainID: 0x123456789ABCDEF0, - ArchivePartInfo: 0x0005000A, - CommentSize: 200, - CommentStart: 6144, - SignatureOffset: 10240, - }, - }, + {"Full header", packageHeaderRoundTripFull()}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { // Write var buf bytes.Buffer - if _, err := tt.header.WriteTo(&buf); err != nil { + if _, err := tt.header.writeTo(&buf); err != nil { t.Fatalf("WriteTo() error = %v", err) } // Read var header PackageHeader - if _, err := header.ReadFrom(&buf); err != nil { + if _, err := header.readFrom(&buf); err != nil { t.Fatalf("ReadFrom() error = %v", err) } @@ -1029,7 +912,7 @@ func TestPackageHeaderRoundTrip(t *testing.T) { } // Validate - if err := header.Validate(); err != nil { + if err := header.validate(); err != nil { t.Errorf("Round-trip header validation failed: %v", err) } }) @@ -1037,6 +920,8 @@ func TestPackageHeaderRoundTrip(t *testing.T) { } // TestPackageHeaderWriteToErrorPaths verifies WriteTo error handling +// +//nolint:gocognit // table-driven error paths func TestPackageHeaderWriteToErrorPaths(t *testing.T) { tests := []struct { name string @@ -1073,7 +958,7 @@ func TestPackageHeaderWriteToErrorPaths(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - _, err := tt.header.WriteTo(tt.writer) + _, err := tt.header.writeTo(tt.writer) if (err != nil) != tt.wantErr { t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) diff --git a/api/go/fileformat/testutil/helpers.go b/api/go/fileformat/testutil/helpers.go index 5047fb8a..2c9ba555 100644 --- a/api/go/fileformat/testutil/helpers.go +++ b/api/go/fileformat/testutil/helpers.go @@ -5,6 +5,7 @@ package testutil import ( + "encoding/binary" "os" "testing" @@ -49,12 +50,23 @@ func CreateTestPackageFile(t *testing.T, path string) { index.FirstEntryOffset = uint64(fileformat.PackageHeaderSize) header.IndexStart = uint64(fileformat.PackageHeaderSize) - header.IndexSize = uint64(index.Size()) + header.IndexSize = uint64(16 + int(index.EntryCount)*fileformat.IndexEntrySize) - if _, err := header.WriteTo(file); err != nil { + if err := binary.Write(file, binary.LittleEndian, header); err != nil { t.Fatalf("Failed to write header: %v", err) } - if _, err := index.WriteTo(file); err != nil { - t.Fatalf("Failed to write index: %v", err) + if err := binary.Write(file, binary.LittleEndian, index.EntryCount); err != nil { + t.Fatalf("Failed to write index entry count: %v", err) + } + if err := binary.Write(file, binary.LittleEndian, index.Reserved); err != nil { + t.Fatalf("Failed to write index reserved: %v", err) + } + if err := binary.Write(file, binary.LittleEndian, index.FirstEntryOffset); err != nil { + t.Fatalf("Failed to write index first entry offset: %v", err) + } + for i := range index.Entries { + if err := binary.Write(file, binary.LittleEndian, index.Entries[i]); err != nil { + t.Fatalf("Failed to write index entry %d: %v", i, err) + } } } diff --git a/api/go/fileformat/testutil/helpers_test.go b/api/go/fileformat/testutil/helpers_test.go index 7c0a47e6..da90a4e6 100644 --- a/api/go/fileformat/testutil/helpers_test.go +++ b/api/go/fileformat/testutil/helpers_test.go @@ -1,6 +1,7 @@ package testutil import ( + "encoding/binary" "os" "path/filepath" "testing" @@ -35,8 +36,7 @@ func TestCreateTestPackageFile(t *testing.T) { // Read and validate header header := fileformat.NewPackageHeader() - _, err = header.ReadFrom(file) - if err != nil { + if err := binary.Read(file, binary.LittleEndian, header); err != nil { t.Fatalf("Failed to read header from created file: %v", err) } @@ -47,9 +47,24 @@ func TestCreateTestPackageFile(t *testing.T) { // Read and validate file index index := fileformat.NewFileIndex() - _, err = index.ReadFrom(file) - if err != nil { - t.Fatalf("Failed to read file index from created file: %v", err) + if err := binary.Read(file, binary.LittleEndian, &index.EntryCount); err != nil { + t.Fatalf("Failed to read index entry count from created file: %v", err) + } + if err := binary.Read(file, binary.LittleEndian, &index.Reserved); err != nil { + t.Fatalf("Failed to read index reserved from created file: %v", err) + } + if err := binary.Read(file, binary.LittleEndian, &index.FirstEntryOffset); err != nil { + t.Fatalf("Failed to read index first entry offset from created file: %v", err) + } + if index.EntryCount > 0 { + index.Entries = make([]fileformat.IndexEntry, 0, index.EntryCount) + for i := uint32(0); i < index.EntryCount; i++ { + var entry fileformat.IndexEntry + if err := binary.Read(file, binary.LittleEndian, &entry); err != nil { + t.Fatalf("Failed to read index entry %d from created file: %v", i, err) + } + index.Entries = append(index.Entries, entry) + } } // Verify index has zero entries (empty package) diff --git a/api/go/generics/concurrency.go b/api/go/generics/concurrency.go index 7d38c805..9839e61a 100644 --- a/api/go/generics/concurrency.go +++ b/api/go/generics/concurrency.go @@ -244,40 +244,40 @@ func (p *WorkerPool[T]) GetWorkerStats() WorkerStats { return stats } +// processJob runs the job through the strategy (or pass-through) and sends the result. +func (w *Worker[T]) processJob(ctx context.Context, job Job[T]) { + if w.strategy == nil { + job.Result <- Ok(job.Data) + return + } + jobCtx, cancel := context.WithCancel(ctx) + go func() { + select { + case <-job.Context.Done(): + cancel() + case <-ctx.Done(): + cancel() + case <-jobCtx.Done(): + } + }() + result, err := w.strategy.Process(jobCtx, job.Data) + cancel() + if err != nil { + job.Result <- Err[T](err) + } else { + job.Result <- Ok(result) + } +} + // run is the main loop for a worker. func (w *Worker[T]) run(ctx context.Context) { for { select { case job := <-w.workChan: - if w.strategy != nil { - // Create a context derived from worker's context that also respects job's context - jobCtx, cancel := context.WithCancel(ctx) - go func() { - select { - case <-job.Context.Done(): - cancel() - case <-ctx.Done(): - cancel() - case <-jobCtx.Done(): - // Context already cancelled, exit - } - }() - result, err := w.strategy.Process(jobCtx, job.Data) - cancel() // Always cancel to clean up the goroutine - if err != nil { - job.Result <- Err[T](err) - } else { - job.Result <- Ok(result) - } - } else { - // No strategy, just pass through - job.Result <- Ok(job.Data) - } - + w.processJob(ctx, job) w.mu.Lock() w.stats.JobsProcessed++ w.mu.Unlock() - case <-w.done: return case <-ctx.Done(): diff --git a/api/go/generics/config_test.go b/api/go/generics/config_test.go index 84ac55f9..5313518f 100644 --- a/api/go/generics/config_test.go +++ b/api/go/generics/config_test.go @@ -17,58 +17,39 @@ func TestConfigBuilder(t *testing.T) { } } -// TestConfigBuilder_WithChunkSize tests WithChunkSize -func TestConfigBuilder_WithChunkSize(t *testing.T) { - builder := NewConfigBuilder[string]() - builder.WithChunkSize(1024) - - config := builder.Build() - if !config.ChunkSize.IsSet() { - t.Error("ChunkSize should be set") +func assertConfigOptionSet[T comparable](t *testing.T, opt Option[T], expected T, fieldName string) { + t.Helper() + if !opt.IsSet() { + t.Errorf("%s should be set", fieldName) } - chunkSize, ok := config.ChunkSize.Get() + val, ok := opt.Get() if !ok { - t.Error("ChunkSize should be retrievable") + t.Errorf("%s should be retrievable", fieldName) } - if chunkSize != 1024 { - t.Errorf("ChunkSize should be 1024, got %d", chunkSize) + if val != expected { + t.Errorf("%s should be %v, got %v", fieldName, expected, val) } } +// TestConfigBuilder_WithChunkSize tests WithChunkSize +func TestConfigBuilder_WithChunkSize(t *testing.T) { + builder := NewConfigBuilder[string]() + builder.WithChunkSize(1024) + assertConfigOptionSet(t, builder.Build().ChunkSize, 1024, "ChunkSize") +} + // TestConfigBuilder_WithMemoryUsage tests WithMemoryUsage func TestConfigBuilder_WithMemoryUsage(t *testing.T) { builder := NewConfigBuilder[int]() builder.WithMemoryUsage(2 * 1024 * 1024) - - config := builder.Build() - if !config.MaxMemoryUsage.IsSet() { - t.Error("MaxMemoryUsage should be set") - } - memoryUsage, ok := config.MaxMemoryUsage.Get() - if !ok { - t.Error("MaxMemoryUsage should be retrievable") - } - if memoryUsage != 2*1024*1024 { - t.Errorf("MaxMemoryUsage should be 2MB, got %d", memoryUsage) - } + assertConfigOptionSet(t, builder.Build().MaxMemoryUsage, 2*1024*1024, "MaxMemoryUsage") } // TestConfigBuilder_WithCompressionLevel tests WithCompressionLevel func TestConfigBuilder_WithCompressionLevel(t *testing.T) { builder := NewConfigBuilder[float64]() builder.WithCompressionLevel(5) - - config := builder.Build() - if !config.CompressionLevel.IsSet() { - t.Error("CompressionLevel should be set") - } - level, ok := config.CompressionLevel.Get() - if !ok { - t.Error("CompressionLevel should be retrievable") - } - if level != 5 { - t.Errorf("CompressionLevel should be 5, got %d", level) - } + assertConfigOptionSet(t, builder.Build().CompressionLevel, 5, "CompressionLevel") } // TestConfigBuilder_WithStrategy tests WithStrategy diff --git a/api/go/generics/core_test.go b/api/go/generics/core_test.go index 792a47d5..0c505ff8 100644 --- a/api/go/generics/core_test.go +++ b/api/go/generics/core_test.go @@ -5,96 +5,50 @@ import ( "testing" ) -// TestOption_String tests Option with string type -func TestOption_String(t *testing.T) { - var opt Option[string] - - // Initially not set +func runOptionLifecycleTest[T comparable](t *testing.T, setVal, defaultVal, zeroVal T) { + t.Helper() + var opt Option[T] if opt.IsSet() { t.Error("Option should not be set initially") } - val, ok := opt.Get() if ok { t.Error("Get should return false when not set") } - if val != "" { - t.Errorf("Get should return zero value, got %q", val) + if val != zeroVal { + t.Errorf("Get should return zero value, got %v", val) } - - // Set value - opt.Set("hello") + opt.Set(setVal) if !opt.IsSet() { t.Error("Option should be set after Set") } - val, ok = opt.Get() if !ok { t.Error("Get should return true when set") } - if val != "hello" { - t.Errorf("Get should return set value, got %q", val) + if val != setVal { + t.Errorf("Get should return set value, got %v", val) } - - // GetOrDefault - if opt.GetOrDefault("default") != "hello" { + if opt.GetOrDefault(defaultVal) != setVal { t.Error("GetOrDefault should return set value") } - - // Clear opt.Clear() if opt.IsSet() { t.Error("Option should not be set after Clear") } - if opt.GetOrDefault("default") != "default" { + if opt.GetOrDefault(defaultVal) != defaultVal { t.Error("GetOrDefault should return default after Clear") } } +// TestOption_String tests Option with string type +func TestOption_String(t *testing.T) { + runOptionLifecycleTest(t, "hello", "default", "") +} + // TestOption_Int tests Option with int type func TestOption_Int(t *testing.T) { - var opt Option[int] - - // Initially not set - if opt.IsSet() { - t.Error("Option should not be set initially") - } - - val, ok := opt.Get() - if ok { - t.Error("Get should return false when not set") - } - if val != 0 { - t.Errorf("Get should return zero value, got %d", val) - } - - // Set value - opt.Set(42) - if !opt.IsSet() { - t.Error("Option should be set after Set") - } - - val, ok = opt.Get() - if !ok { - t.Error("Get should return true when set") - } - if val != 42 { - t.Errorf("Get should return set value, got %d", val) - } - - // GetOrDefault - if opt.GetOrDefault(0) != 42 { - t.Error("GetOrDefault should return set value") - } - - // Clear - opt.Clear() - if opt.IsSet() { - t.Error("Option should not be set after Clear") - } - if opt.GetOrDefault(100) != 100 { - t.Error("GetOrDefault should return default after Clear") - } + runOptionLifecycleTest(t, 42, 100, 0) } // CustomType is a custom type for testing @@ -151,35 +105,30 @@ func TestOption_CustomType(t *testing.T) { } } -// TestResult_String tests Result with string type -func TestResult_String(t *testing.T) { - // Ok result - result := Ok("success") +func runResultLifecycleTest[T comparable](t *testing.T, okVal, zeroVal T) { + t.Helper() + result := Ok(okVal) if !result.IsOk() { t.Error("Result should be Ok") } if result.IsErr() { t.Error("Result should not be Err") } - val, err := result.Unwrap() if err != nil { t.Errorf("Unwrap should return nil error for Ok, got %v", err) } - if val != "success" { - t.Errorf("Unwrap should return value, got %q", val) + if val != okVal { + t.Errorf("Unwrap should return value, got %v", val) } - - // Err result testErr := errors.New("test error") - result = Err[string](testErr) + result = Err[T](testErr) if result.IsOk() { t.Error("Result should not be Ok") } if !result.IsErr() { t.Error("Result should be Err") } - val, err = result.Unwrap() if err == nil { t.Error("Unwrap should return error for Err") @@ -187,50 +136,19 @@ func TestResult_String(t *testing.T) { if err != testErr { t.Errorf("Unwrap should return set error, got %v", err) } - if val != "" { - t.Errorf("Unwrap should return zero value for Err, got %q", val) + if val != zeroVal { + t.Errorf("Unwrap should return zero value for Err, got %v", val) } } +// TestResult_String tests Result with string type +func TestResult_String(t *testing.T) { + runResultLifecycleTest(t, "success", "") +} + // TestResult_Int tests Result with int type func TestResult_Int(t *testing.T) { - // Ok result - result := Ok(42) - if !result.IsOk() { - t.Error("Result should be Ok") - } - if result.IsErr() { - t.Error("Result should not be Err") - } - - val, err := result.Unwrap() - if err != nil { - t.Errorf("Unwrap should return nil error for Ok, got %v", err) - } - if val != 42 { - t.Errorf("Unwrap should return value, got %d", val) - } - - // Err result - testErr := errors.New("test error") - result = Err[int](testErr) - if result.IsOk() { - t.Error("Result should not be Ok") - } - if !result.IsErr() { - t.Error("Result should be Err") - } - - val, err = result.Unwrap() - if err == nil { - t.Error("Unwrap should return error for Err") - } - if err != testErr { - t.Errorf("Unwrap should return set error, got %v", err) - } - if val != 0 { - t.Errorf("Unwrap should return zero value for Err, got %d", val) - } + runResultLifecycleTest(t, 42, 0) } // TestResult_CustomType tests Result with custom type diff --git a/api/go/generics/pathentry_test.go b/api/go/generics/pathentry_test.go index d3bd0b0f..0c260ade 100644 --- a/api/go/generics/pathentry_test.go +++ b/api/go/generics/pathentry_test.go @@ -105,6 +105,8 @@ func TestPathEntryValidation(t *testing.T) { // TestPathEntryReadFrom verifies ReadFrom deserialization // Specification: package_file_format.md: 4.1.4.2 Path Entries +// +//nolint:gocognit // table-driven test func TestPathEntryReadFrom(t *testing.T) { tests := []struct { name string @@ -238,6 +240,8 @@ func TestPathEntryReadFromEmptyPath(t *testing.T) { // TestPathEntryWriteTo verifies WriteTo serialization // Specification: package_file_format.md: 4.1.4.2 Path Entries +// +//nolint:gocognit // table-driven test func TestPathEntryWriteTo(t *testing.T) { tests := []struct { name string @@ -304,6 +308,8 @@ func TestPathEntryWriteTo(t *testing.T) { } // TestPathEntryRoundTrip verifies round-trip serialization +// +//nolint:gocognit // table-driven round-trip func TestPathEntryRoundTrip(t *testing.T) { tests := []struct { name string @@ -366,6 +372,8 @@ func TestPathEntryRoundTrip(t *testing.T) { } // TestPathEntryWriteToErrorPaths verifies WriteTo error handling +// +//nolint:gocognit // table-driven error paths func TestPathEntryWriteToErrorPaths(t *testing.T) { tests := []struct { name string diff --git a/api/go/generics/patterns_test.go b/api/go/generics/patterns_test.go index 93dd8faa..12153e8b 100644 --- a/api/go/generics/patterns_test.go +++ b/api/go/generics/patterns_test.go @@ -93,7 +93,7 @@ func TestValidationRule_String(t *testing.T) { // Valid case rule := &ValidationRule[string]{ Name: "non-empty", - Predicate: func(s string) bool { return len(s) > 0 }, + Predicate: func(s string) bool { return s != "" }, Message: "string cannot be empty", } diff --git a/api/go/generics/tag.go b/api/go/generics/tag.go index 91d72f71..86655e32 100644 --- a/api/go/generics/tag.go +++ b/api/go/generics/tag.go @@ -9,7 +9,7 @@ package generics // TagValueType represents the type of a tag value. // -// Specification: api_file_mgmt_file_entry.md: 13. TagValueType Type +// Specification: api_file_mgmt_file_entry.md: 14 14. TagValueType Type type TagValueType uint8 const ( diff --git a/api/go/generics/tag_test.go b/api/go/generics/tag_test.go index f0a35e6d..0d8fec3e 100644 --- a/api/go/generics/tag_test.go +++ b/api/go/generics/tag_test.go @@ -4,6 +4,33 @@ import ( "testing" ) +func assertTagKeyAndType(t *testing.T, tag interface{}, key string, tagType TagValueType) { + t.Helper() + if tag == nil { + t.Fatal("NewTag() returned nil") + } + switch tagVal := tag.(type) { + case *Tag[string]: + if tagVal.Key != key || tagVal.Type != tagType { + t.Errorf("NewTag() tag.Key = %q, tag.Type = %v; want Key %q Type %v", tagVal.Key, tagVal.Type, key, tagType) + } + case *Tag[int64]: + if tagVal.Key != key || tagVal.Type != tagType { + t.Errorf("NewTag() tag.Key = %q, tag.Type = %v; want Key %q Type %v", tagVal.Key, tagVal.Type, key, tagType) + } + case *Tag[float64]: + if tagVal.Key != key || tagVal.Type != tagType { + t.Errorf("NewTag() tag.Key = %q, tag.Type = %v; want Key %q Type %v", tagVal.Key, tagVal.Type, key, tagType) + } + case *Tag[bool]: + if tagVal.Key != key || tagVal.Type != tagType { + t.Errorf("NewTag() tag.Key = %q, tag.Type = %v; want Key %q Type %v", tagVal.Key, tagVal.Type, key, tagType) + } + default: + t.Errorf("NewTag() returned unexpected type: %T", tag) + } +} + // TestNewTag tests NewTag factory function func TestNewTag(t *testing.T) { tests := []struct { @@ -30,7 +57,6 @@ func TestNewTag(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { var tag interface{} - switch v := tt.value.(type) { case string: tag = NewTag[string](tt.key, v, tt.tagType) @@ -41,43 +67,7 @@ func TestNewTag(t *testing.T) { case bool: tag = NewTag[bool](tt.key, v, tt.tagType) } - - if tag == nil { - t.Fatal("NewTag() returned nil") - } - // Use type assertion to check tag properties - switch tagVal := tag.(type) { - case *Tag[string]: - if tagVal.Key != tt.key { - t.Errorf("NewTag() tag.Key = %q, want %q", tagVal.Key, tt.key) - } - if tagVal.Type != tt.tagType { - t.Errorf("NewTag() tag.Type = %v, want %v", tagVal.Type, tt.tagType) - } - case *Tag[int64]: - if tagVal.Key != tt.key { - t.Errorf("NewTag() tag.Key = %q, want %q", tagVal.Key, tt.key) - } - if tagVal.Type != tt.tagType { - t.Errorf("NewTag() tag.Type = %v, want %v", tagVal.Type, tt.tagType) - } - case *Tag[float64]: - if tagVal.Key != tt.key { - t.Errorf("NewTag() tag.Key = %q, want %q", tagVal.Key, tt.key) - } - if tagVal.Type != tt.tagType { - t.Errorf("NewTag() tag.Type = %v, want %v", tagVal.Type, tt.tagType) - } - case *Tag[bool]: - if tagVal.Key != tt.key { - t.Errorf("NewTag() tag.Key = %q, want %q", tagVal.Key, tt.key) - } - if tagVal.Type != tt.tagType { - t.Errorf("NewTag() tag.Type = %v, want %v", tagVal.Type, tt.tagType) - } - default: - t.Errorf("NewTag() returned unexpected type: %T", tag) - } + assertTagKeyAndType(t, tag, tt.key, tt.tagType) }) } } diff --git a/api/go/generics/validation.go b/api/go/generics/validation.go index 6b60c358..8e281e76 100644 --- a/api/go/generics/validation.go +++ b/api/go/generics/validation.go @@ -91,58 +91,57 @@ func ValidateWith[T any](ctx context.Context, value T, validator Validator[T]) e // errors := ValidateAll(ctx, []int{1, -1, 2, -2}, validator) func ValidateAll[T any](ctx context.Context, values []T, validator Validator[T]) []error { if validator == nil { - err := pkgerrors.NewTypedPackageError(pkgerrors.ErrTypeValidation, "validator is nil", nil, pkgerrors.ValidationErrorContext{ - Field: "validator", - Value: nil, - Expected: "non-nil validator", - }) - errors := make([]error, len(values)) - for i := range errors { - errors[i] = err - } - return errors + return validateAllNilValidator(values) } - var validationErrors []error for i, value := range values { - // Check context cancellation before each validation - if ctx != nil { - select { - case <-ctx.Done(): - // Return context error for remaining values - ctxErr := pkgerrors.WrapErrorWithContext(ctx.Err(), pkgerrors.ErrTypeContext, "validation cancelled", pkgerrors.ValidationErrorContext{ - Field: fmt.Sprintf("values[%d]", i), - Value: value, - Expected: "validation completed", - }) - // Add context error for this and all remaining values - for j := i; j < len(values); j++ { - validationErrors = append(validationErrors, ctxErr) - } - return validationErrors - default: + if ctxErr := validateAllCheckContext(ctx, i, value); ctxErr != nil { + for j := i; j < len(values); j++ { + validationErrors = append(validationErrors, ctxErr) } + return validationErrors } - if err := validator.Validate(value); err != nil { - // If error is already a PackageError, use it - if pkgErr, ok := pkgerrors.IsPackageError(err); ok { - validationErrors = append(validationErrors, pkgErr) - } else { - // Wrap error with ValidationErrorContext - wrappedErr := pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeValidation, fmt.Sprintf("validation failed at index %d", i), pkgerrors.ValidationErrorContext{ - Field: fmt.Sprintf("values[%d]", i), - Value: value, - Expected: "valid value", - }) - validationErrors = append(validationErrors, wrappedErr) - } + validationErrors = append(validationErrors, validateAllWrapError(err, i, value)) } } - return validationErrors } +func validateAllNilValidator[T any](values []T) []error { + err := pkgerrors.NewTypedPackageError(pkgerrors.ErrTypeValidation, "validator is nil", nil, pkgerrors.ValidationErrorContext{ + Field: "validator", Value: nil, Expected: "non-nil validator", + }) + out := make([]error, len(values)) + for i := range out { + out[i] = err + } + return out +} + +func validateAllCheckContext[T any](ctx context.Context, i int, value T) error { + if ctx == nil { + return nil + } + select { + case <-ctx.Done(): + return pkgerrors.WrapErrorWithContext(ctx.Err(), pkgerrors.ErrTypeContext, "validation cancelled", pkgerrors.ValidationErrorContext{ + Field: fmt.Sprintf("values[%d]", i), Value: value, Expected: "validation completed", + }) + default: + return nil + } +} + +func validateAllWrapError[T any](err error, i int, value T) error { + if pkgErr, ok := pkgerrors.IsPackageError(err); ok { + return pkgErr + } + return pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeValidation, fmt.Sprintf("validation failed at index %d", i), pkgerrors.ValidationErrorContext{ + Field: fmt.Sprintf("values[%d]", i), Value: value, Expected: "valid value", + }) +} + // ComposeValidators creates a validator that runs multiple validators. // // ComposeValidators[T] creates a composite validator that runs all provided diff --git a/api/go/generics/validation_test.go b/api/go/generics/validation_test.go index a15a3ec1..575057b1 100644 --- a/api/go/generics/validation_test.go +++ b/api/go/generics/validation_test.go @@ -39,7 +39,7 @@ func TestValidateWith_DifferentTypes(t *testing.T) { // String type rule := &ValidationRule[string]{ - Predicate: func(s string) bool { return len(s) > 0 }, + Predicate: func(s string) bool { return s != "" }, Message: "string cannot be empty", } err := ValidateWith(ctx, "test", rule) @@ -121,7 +121,7 @@ func TestValidateAll_MixedResults(t *testing.T) { func TestComposeValidators(t *testing.T) { // Create two validators validator1 := &ValidationRule[string]{ - Predicate: func(s string) bool { return len(s) > 0 }, + Predicate: func(s string) bool { return s != "" }, Message: "string cannot be empty", } validator2 := &ValidationRule[string]{ @@ -169,7 +169,7 @@ func TestComposeValidators_Empty(t *testing.T) { func TestComposeValidators_NilValidators(t *testing.T) { validator := &ValidationRule[string]{ - Predicate: func(s string) bool { return len(s) > 0 }, + Predicate: func(s string) bool { return s != "" }, Message: "string cannot be empty", } @@ -190,7 +190,7 @@ func TestComposeValidators_NilValidators(t *testing.T) { func TestComposeValidators_MultipleTypes(t *testing.T) { // String validators strValidator1 := &ValidationRule[string]{ - Predicate: func(s string) bool { return len(s) > 0 }, + Predicate: func(s string) bool { return s != "" }, Message: "string cannot be empty", } strValidator2 := &ValidationRule[string]{ diff --git a/api/go/internal/helpers.go b/api/go/internal/helpers.go index 3684985c..1632aaf2 100644 --- a/api/go/internal/helpers.go +++ b/api/go/internal/helpers.go @@ -26,6 +26,7 @@ package internal import ( "context" + "encoding/binary" "fmt" "hash/crc32" "io" @@ -115,10 +116,8 @@ func ReadAndValidateHeader(ctx context.Context, reader io.Reader) (*fileformat.P return nil, err } - header := fileformat.NewPackageHeader() - _, err := header.ReadFrom(reader) + header, err := readPackageHeader(reader) if err != nil { - // Check if this is a magic number error from ReadFrom errMsg := err.Error() if strings.Contains(errMsg, "magic") { return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, errMsg, err, struct{}{}) @@ -126,16 +125,69 @@ func ReadAndValidateHeader(ctx context.Context, reader io.Reader) (*fileformat.P return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "failed to read package header", err, struct{}{}) } - // Note: ReadFrom already validates the magic number, so we don't need to validate it again. - - // Validate header structure - if err := header.Validate(); err != nil { + if err := validatePackageHeader(header); err != nil { return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "invalid package header", err, struct{}{}) } return header, nil } +func readPackageHeader(reader io.Reader) (*fileformat.PackageHeader, error) { + header := fileformat.NewPackageHeader() + if err := binary.Read(reader, binary.LittleEndian, header); err != nil { + if err == io.EOF || err == io.ErrUnexpectedEOF { + return nil, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, fmt.Sprintf("failed to read header: incomplete data (expected %d bytes)", fileformat.PackageHeaderSize), pkgerrors.ValidationErrorContext{ + Field: "Header", + Value: nil, + Expected: fmt.Sprintf("%d bytes", fileformat.PackageHeaderSize), + }) + } + return nil, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read header", pkgerrors.ValidationErrorContext{ + Field: "Header", + Value: nil, + Expected: fmt.Sprintf("%d bytes", fileformat.PackageHeaderSize), + }) + } + + if header.Magic != fileformat.NVPKMagic { + return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "invalid magic number", nil, pkgerrors.ValidationErrorContext{ + Field: "Magic", + Value: fmt.Sprintf("0x%08X", header.Magic), + Expected: fmt.Sprintf("0x%08X", fileformat.NVPKMagic), + }) + } + + return header, nil +} + +func validatePackageHeader(header *fileformat.PackageHeader) error { + if header == nil { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "package header is nil", nil, struct{}{}) + } + if header.Magic != fileformat.NVPKMagic { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "invalid magic number", nil, pkgerrors.ValidationErrorContext{ + Field: "Magic", + Value: fmt.Sprintf("0x%08X", header.Magic), + Expected: fmt.Sprintf("0x%08X", fileformat.NVPKMagic), + }) + } + if header.FormatVersion != fileformat.FormatVersion { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "unsupported format version", nil, pkgerrors.ValidationErrorContext{ + Field: "FormatVersion", + Value: header.FormatVersion, + Expected: fmt.Sprintf("%d", fileformat.FormatVersion), + }) + } + if header.Reserved != 0 { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "reserved field must be 0", nil, pkgerrors.ValidationErrorContext{ + Field: "Reserved", + Value: header.Reserved, + Expected: "0", + }) + } + return nil +} + // LoadFileEntry loads a FileEntry from the specified offset in the file. // Returns the FileEntry or an error if loading fails. func LoadFileEntry(file *os.File, offset uint64) (*metadata.FileEntry, error) { @@ -186,7 +238,7 @@ func LoadFileEntry(file *os.File, offset uint64) (*metadata.FileEntry, error) { // - []string: Canonical path segments // - error: Validation error if path would escape root or result in empty path // -// Specification: api_core.md: 1.1.2 Package Path Semantics +// Specification: api_core.md: 2.1.3 Dot Segment Canonicalization func canonicalizePathSegments(path string) ([]string, error) { // Split path by separator segments := strings.Split(path, "/") @@ -238,7 +290,7 @@ func canonicalizePathSegments(path string) ([]string, error) { // - string: Normalized canonical path with leading '/' // - error: Validation error if path is invalid or would escape root // -// Specification: api_core.md: 1.1.2 Package Path Semantics +// Specification: api_core.md: 12.1 NormalizePackagePath Function func NormalizePackagePath(path string) (string, error) { if path == "" { return "", pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "path cannot be empty", nil, pkgerrors.ValidationErrorContext{ @@ -328,7 +380,7 @@ func NormalizePackagePath(path string) (string, error) { // Returns: // - string: Path in display format (without leading "/") // -// Specification: api_core.md: 1.1.2 Package Path Semantics +// Specification: api_core.md: 12.2 ToDisplayPath Function func ToDisplayPath(storedPath string) string { // Strip leading "/" for display // Users should see relative paths, not package-root-prefixed paths @@ -355,7 +407,7 @@ func ToDisplayPath(storedPath string) string { // - []string: Warning messages (empty slice if no warnings) // - error: Only if path exceeds absolute maximum (32,767 bytes) // -// Specification: file_validation.md: 1. File Validation Requirements +// Specification: api_core.md: 12.4 ValidatePathLength Function func ValidatePathLength(path string) ([]string, error) { pathLen := len(path) // UTF-8 byte length @@ -403,7 +455,7 @@ func ValidatePathLength(path string) ([]string, error) { // Returns: // - error: Validation error if path is invalid, nil if valid // -// Specification: api_core.md: 1.1.2 Package Path Semantics +// Specification: api_core.md: 12.3 ValidatePackagePath Function func ValidatePackagePath(path string) error { // Reject empty path if path == "" { diff --git a/api/go/internal/helpers_test.go b/api/go/internal/helpers_test.go index a5061437..12d9c18f 100644 --- a/api/go/internal/helpers_test.go +++ b/api/go/internal/helpers_test.go @@ -5,6 +5,7 @@ package internal import ( "context" + "encoding/binary" "os" "path/filepath" "runtime" @@ -20,6 +21,26 @@ import ( "github.com/novus-engine/novuspack/api/go/pkgerrors" ) +func assertPathValidationError(t *testing.T, err error, shouldError bool, errorType pkgerrors.ErrorType, fnName string) { + t.Helper() + if shouldError { + if err == nil { + t.Errorf("%s() expected error, got nil", fnName) + return + } + var pkgErr *pkgerrors.PackageError + if !pkgerrors.As(err, &pkgErr) { + t.Errorf("%s() error is not a PackageError: %v", fnName, err) + return + } + if pkgErr.Type != errorType { + t.Errorf("%s() error type = %v, want %v", fnName, pkgErr.Type, errorType) + } + } else if err != nil { + t.Errorf("%s() unexpected error: %v", fnName, err) + } +} + // TestValidatePath tests the ValidatePath function with various inputs. func TestValidatePath(t *testing.T) { tests := []struct { @@ -66,29 +87,14 @@ func TestValidatePath(t *testing.T) { t.Run(tt.name, func(t *testing.T) { ctx := context.Background() err := ValidatePath(ctx, tt.path) - if tt.shouldError { - if err == nil { - t.Errorf("ValidatePath() expected error, got nil") - return - } - var pkgErr *pkgerrors.PackageError - if !pkgerrors.As(err, &pkgErr) { - t.Errorf("ValidatePath() error is not a PackageError: %v", err) - return - } - if pkgErr.Type != tt.errorType { - t.Errorf("ValidatePath() error type = %v, want %v", pkgErr.Type, tt.errorType) - } - } else { - if err != nil { - t.Errorf("ValidatePath() unexpected error: %v", err) - } - } + assertPathValidationError(t, err, tt.shouldError, tt.errorType, "ValidatePath") }) } } // TestCheckContext tests the CheckContext function with various context scenarios. +// +//nolint:gocognit // table-driven context cases func TestCheckContext(t *testing.T) { tests := []struct { name string @@ -168,16 +174,16 @@ func TestCheckContext(t *testing.T) { t.Errorf("CheckContext() error should be a PackageError") } } - } else { - if err != nil { - t.Errorf("CheckContext() unexpected error: %v", err) - } + } else if err != nil { + t.Errorf("CheckContext() unexpected error: %v", err) } }) } } // TestOpenFileForReading tests the OpenFileForReading function. +// +//nolint:gocognit // table-driven file/open cases func TestOpenFileForReading(t *testing.T) { // Create a temporary file for testing tmpDir := t.TempDir() @@ -326,11 +332,11 @@ func TestOpenFileForReading_PermissionError(t *testing.T) { _ = file.Close() // Remove read permission - if err := os.Chmod(path, 0000); err != nil { + if err := os.Chmod(path, 0o000); err != nil { t.Fatalf("Failed to chmod file: %v", err) } defer func() { - _ = os.Chmod(path, 0644) // Restore permissions for cleanup + _ = os.Chmod(path, 0o644) // Restore permissions for cleanup }() // Try to open the file - should fail with permission error @@ -354,6 +360,8 @@ func TestOpenFileForReading_PermissionError(t *testing.T) { } // TestReadAndValidateHeader tests the ReadAndValidateHeader function. +// +//nolint:gocognit // table-driven header read/validate cases func TestReadAndValidateHeader(t *testing.T) { tmpDir := t.TempDir() @@ -372,7 +380,7 @@ func TestReadAndValidateHeader(t *testing.T) { t.Fatalf("Failed to create test file: %v", err) } header := fileformat.NewPackageHeader() - _, _ = header.WriteTo(file) + _ = writeTestHeader(t, file, header) _, _ = file.Seek(0, 0) return file }, @@ -418,7 +426,7 @@ func TestReadAndValidateHeader(t *testing.T) { } // Write valid magic but invalid rest of header header := fileformat.NewPackageHeader() - _, _ = header.WriteTo(file) + _ = writeTestHeader(t, file, header) // Corrupt the file by truncating it _, _ = file.Seek(0, 0) _ = file.Truncate(10) // Truncate to invalid size @@ -482,7 +490,7 @@ func TestReadAndValidateHeader_ValidateError(t *testing.T) { header := fileformat.NewPackageHeader() header.Magic = fileformat.NVPKMagic header.FormatVersion = 999 // Invalid version that will fail validation - _, _ = header.WriteTo(file) + _ = writeTestHeader(t, file, header) _, _ = file.Seek(0, 0) _ = file.Close() @@ -646,6 +654,8 @@ func TestReadAndValidateHeader_ReadFromGenericError(t *testing.T) { } // TestNormalizePackagePath tests the NormalizePackagePath function. +// +//nolint:gocognit // table-driven path cases func TestNormalizePackagePath(t *testing.T) { tests := []struct { name string @@ -824,28 +834,45 @@ func TestValidatePackagePath(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { err := ValidatePackagePath(tt.path) - if tt.shouldError { - if err == nil { - t.Errorf("ValidatePackagePath() expected error, got nil") - return - } - var pkgErr *pkgerrors.PackageError - if !pkgerrors.As(err, &pkgErr) { - t.Errorf("ValidatePackagePath() error is not a PackageError: %v", err) - return - } - if pkgErr.Type != tt.errorType { - t.Errorf("ValidatePackagePath() error type = %v, want %v", pkgErr.Type, tt.errorType) - } - } else { - if err != nil { - t.Errorf("ValidatePackagePath() unexpected error: %v", err) - } - } + assertPathValidationError(t, err, tt.shouldError, tt.errorType, "ValidatePackagePath") }) } } +func writeTestHeader(t *testing.T, file *os.File, header *fileformat.PackageHeader) error { + t.Helper() + if err := binary.Write(file, binary.LittleEndian, header); err != nil { + return err + } + return nil +} + +func writeTestIndex(t *testing.T, file *os.File, index *fileformat.FileIndex) error { + t.Helper() + if err := binary.Write(file, binary.LittleEndian, index.EntryCount); err != nil { + return err + } + if err := binary.Write(file, binary.LittleEndian, index.Reserved); err != nil { + return err + } + if err := binary.Write(file, binary.LittleEndian, index.FirstEntryOffset); err != nil { + return err + } + for i := range index.Entries { + if err := binary.Write(file, binary.LittleEndian, index.Entries[i]); err != nil { + return err + } + } + return nil +} + +func testFileIndexSize(index *fileformat.FileIndex) uint64 { + if index == nil { + return 0 + } + return uint64(16 + len(index.Entries)*fileformat.IndexEntrySize) +} + // TestLoadFileEntry tests the LoadFileEntry function. func TestLoadFileEntry(t *testing.T) { tmpDir := t.TempDir() @@ -861,20 +888,20 @@ func TestLoadFileEntry(t *testing.T) { header := fileformat.NewPackageHeader() header.IndexStart = uint64(fileformat.PackageHeaderSize) header.IndexSize = 16 // Empty index size - if _, err := header.WriteTo(file); err != nil { + if err := writeTestHeader(t, file, header); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } // Write empty index index := fileformat.NewFileIndex() - if _, err := index.WriteTo(file); err != nil { + if err := writeTestIndex(t, file, index); err != nil { _ = file.Close() t.Fatalf("Failed to write index: %v", err) } // Calculate file entry offset (after header and index) - entryOffset := uint64(fileformat.PackageHeaderSize) + uint64(index.Size()) + entryOffset := uint64(fileformat.PackageHeaderSize) + testFileIndexSize(index) // Write a minimal file entry entry := metadata.NewFileEntry() @@ -991,6 +1018,8 @@ func TestLoadFileEntry_InvalidEntry(t *testing.T) { // TestNormalizePackagePath_Canonicalization tests path canonicalization with dot segments. // These tests verify that dot segments are properly canonicalized rather than rejected. // This is the TDD Red Phase - these tests will fail until canonicalization is implemented. +// +//nolint:gocognit // table-driven canonicalization cases func TestNormalizePackagePath_Canonicalization(t *testing.T) { tests := []struct { name string @@ -1277,6 +1306,8 @@ func TestNormalizePackagePath_UnicodeNormalization(t *testing.T) { } // TestValidatePathLength tests the ValidatePathLength function. +// +//nolint:gocognit // table-driven length/validation cases func TestValidatePathLength(t *testing.T) { // Helper to generate path of specific length generatePath := func(length int) string { @@ -1395,10 +1426,8 @@ func TestValidatePathLength(t *testing.T) { if len(warnings) != tt.warningCount { t.Errorf("ValidatePathLength() warning count = %d, want %d", len(warnings), tt.warningCount) } - } else { - if len(warnings) != 0 { - t.Errorf("ValidatePathLength() unexpected warnings for %d byte path: %v", tt.pathLength, warnings) - } + } else if len(warnings) != 0 { + t.Errorf("ValidatePathLength() unexpected warnings for %d byte path: %v", tt.pathLength, warnings) } } }) @@ -1456,24 +1485,7 @@ func TestValidatePackagePath_Canonicalization(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { err := ValidatePackagePath(tt.path) - if tt.shouldError { - if err == nil { - t.Errorf("ValidatePackagePath() expected error, got nil") - return - } - var pkgErr *pkgerrors.PackageError - if !pkgerrors.As(err, &pkgErr) { - t.Errorf("ValidatePackagePath() error is not a PackageError: %v", err) - return - } - if pkgErr.Type != tt.errorType { - t.Errorf("ValidatePackagePath() error type = %v, want %v", pkgErr.Type, tt.errorType) - } - } else { - if err != nil { - t.Errorf("ValidatePackagePath() unexpected error: %v", err) - } - } + assertPathValidationError(t, err, tt.shouldError, tt.errorType, "ValidatePackagePath") }) } } diff --git a/api/go/internal/testhelpers/context_test.go b/api/go/internal/testhelpers/context_test.go index ed4b1082..528b557a 100644 --- a/api/go/internal/testhelpers/context_test.go +++ b/api/go/internal/testhelpers/context_test.go @@ -6,6 +6,20 @@ import ( "time" ) +// runContextDoneTest verifies ctx is done and Err() matches wantErr; fails t on timeout or wrong error. +func runContextDoneTest(t *testing.T, ctx context.Context, wantErr error, timeoutMsg string) { + t.Helper() + select { + case <-ctx.Done(): + err := ctx.Err() + if err != wantErr { + t.Errorf("expected %v error, got %v", wantErr, err) + } + case <-time.After(100 * time.Millisecond): + t.Error(timeoutMsg) + } +} + func TestCancelledContext(t *testing.T) { t.Run("returns cancelled context", func(t *testing.T) { ctx := CancelledContext() @@ -29,18 +43,7 @@ func TestCancelledContext(t *testing.T) { }) t.Run("context error in operations", func(t *testing.T) { - ctx := CancelledContext() - - // Simulate using the context in an operation - select { - case <-ctx.Done(): - err := ctx.Err() - if err != context.Canceled { - t.Errorf("expected context.Canceled error, got %v", err) - } - case <-time.After(100 * time.Millisecond): - t.Error("context should be immediately cancelled") - } + runContextDoneTest(t, CancelledContext(), context.Canceled, "context should be immediately cancelled") }) } @@ -67,18 +70,7 @@ func TestTimeoutContext(t *testing.T) { }) t.Run("context deadline exceeded in operations", func(t *testing.T) { - ctx := TimeoutContext() - - // Simulate using the context in an operation - select { - case <-ctx.Done(): - err := ctx.Err() - if err != context.DeadlineExceeded { - t.Errorf("expected context.DeadlineExceeded error, got %v", err) - } - case <-time.After(100 * time.Millisecond): - t.Error("context should be timed out") - } + runContextDoneTest(t, TimeoutContext(), context.DeadlineExceeded, "context should be timed out") }) t.Run("deadline has passed", func(t *testing.T) { diff --git a/api/go/internal/testhelpers/io_test.go b/api/go/internal/testhelpers/io_test.go index b587bcb3..7ec7fc66 100644 --- a/api/go/internal/testhelpers/io_test.go +++ b/api/go/internal/testhelpers/io_test.go @@ -2,9 +2,40 @@ package testhelpers import ( "errors" + "io" "testing" ) +// assertReadResult checks n and err from a Read call; fails t if err != nil or n != wantN. +func assertReadResult(t *testing.T, n int, err error, wantN int, op string) { + t.Helper() + if err != nil { + t.Errorf("%s should succeed, got error: %v", op, err) + } + if n != wantN { + t.Errorf("%s: expected %d bytes, got %d", op, wantN, n) + } +} + +// testWriterTwoWrites performs two writes and checks n and err for each; used to reduce dupl in tests. +func testWriterTwoWrites(t *testing.T, w io.Writer, first, second []byte, wantFirstN, wantSecondN int) { + t.Helper() + n, err := w.Write(first) + if err != nil { + t.Errorf("write should succeed, got error: %v", err) + } + if n != wantFirstN { + t.Errorf("expected %d bytes written, got %d", wantFirstN, n) + } + n, err = w.Write(second) + if err != nil { + t.Errorf("second write should succeed, got error: %v", err) + } + if n != wantSecondN { + t.Errorf("expected %d bytes written, got %d", wantSecondN, n) + } +} + func TestErrorWriter(t *testing.T) { t.Run("default error", func(t *testing.T) { w := NewErrorWriter() @@ -40,50 +71,31 @@ func TestFailingWriter(t *testing.T) { }) t.Run("fails after limit", func(t *testing.T) { - w := NewFailingWriter(5) - _, _ = w.Write([]byte("test")) - n, err := w.Write([]byte("more")) - if err == nil { - t.Error("second write should fail") - } - // Should write 1 byte (5-4=1) then fail - if n != 1 { - t.Errorf("expected 1 byte written, got %d", n) - } + testFailingWriterAtLimit(t, 5, 1) }) t.Run("fails immediately when at limit", func(t *testing.T) { - w := NewFailingWriter(4) - _, _ = w.Write([]byte("test")) - n, err := w.Write([]byte("more")) - if err == nil { - t.Error("write at limit should fail") - } - if n != 0 { - t.Errorf("expected 0 bytes written, got %d", n) - } + testFailingWriterAtLimit(t, 4, 0) }) } +// testFailingWriterAtLimit writes "test" then "more" to a FailingWriter with the given limit; expects write to fail and n to match expectedN. +func testFailingWriterAtLimit(t *testing.T, limit, expectedN int) { + t.Helper() + w := NewFailingWriter(limit) + _, _ = w.Write([]byte("test")) + n, err := w.Write([]byte("more")) + if err == nil { + t.Error("second write should fail") + } + if n != expectedN { + t.Errorf("expected %d byte(s) written, got %d", expectedN, n) + } +} + func TestIncompleteWriter(t *testing.T) { t.Run("writes partial data", func(t *testing.T) { - w := NewIncompleteWriter(10) - - n, err := w.Write([]byte("hello")) - if err != nil { - t.Errorf("write should succeed, got error: %v", err) - } - if n != 5 { - t.Errorf("expected 5 bytes written, got %d", n) - } - - n, err = w.Write([]byte("world")) - if err != nil { - t.Errorf("partial write should succeed, got error: %v", err) - } - if n != 5 { - t.Errorf("expected 5 bytes written (reaching limit), got %d", n) - } + testWriterTwoWrites(t, NewIncompleteWriter(10), []byte("hello"), []byte("world"), 5, 5) }) t.Run("fails beyond limit", func(t *testing.T) { @@ -100,25 +112,7 @@ func TestIncompleteWriter(t *testing.T) { func TestPartialWriter(t *testing.T) { t.Run("writes until limit", func(t *testing.T) { - w := NewPartialWriter(10) - - // First write succeeds - n, err := w.Write([]byte("test")) - if err != nil { - t.Errorf("write should succeed, got error: %v", err) - } - if n != 4 { - t.Errorf("expected 4 bytes, got %d", n) - } - - // Write up to limit - n, err = w.Write([]byte("hello world")) - if err != nil { - t.Errorf("write should succeed, got error: %v", err) - } - if n != 6 { - t.Errorf("expected 6 bytes (to reach 10 total), got %d", n) - } + testWriterTwoWrites(t, NewPartialWriter(10), []byte("test"), []byte("hello world"), 4, 6) }) t.Run("returns zero without error beyond limit", func(t *testing.T) { @@ -165,30 +159,14 @@ func TestPartialReader(t *testing.T) { t.Run("reads data then errors", func(t *testing.T) { data := []byte("test data") r := NewPartialReader(data) - - // First read succeeds buf := make([]byte, 5) n, err := r.Read(buf) - if err != nil { - t.Errorf("first read should succeed, got error: %v", err) - } - if n != 5 { - t.Errorf("expected 5 bytes, got %d", n) - } - if string(buf) != "test " { - t.Errorf("expected 'test ', got %q", string(buf)) + assertReadResult(t, n, err, 5, "first read") + if string(buf[:n]) != "test " { + t.Errorf("expected 'test ', got %q", string(buf[:n])) } - - // Second read gets remaining data n, err = r.Read(buf) - if err != nil { - t.Errorf("second read should succeed, got error: %v", err) - } - if n != 4 { - t.Errorf("expected 4 bytes, got %d", n) - } - - // Third read returns error + assertReadResult(t, n, err, 4, "second read") n, err = r.Read(buf) if err == nil { t.Error("read beyond data should return error") diff --git a/api/go/internal/testhelpers/strings_test.go b/api/go/internal/testhelpers/strings_test.go index e84382f7..438627b0 100644 --- a/api/go/internal/testhelpers/strings_test.go +++ b/api/go/internal/testhelpers/strings_test.go @@ -80,6 +80,7 @@ func TestContainsIgnoreCase(t *testing.T) { }) } +//nolint:gocognit // table-driven index cases func TestIndexIgnoreCase(t *testing.T) { t.Run("finds index case insensitive", func(t *testing.T) { idx := IndexIgnoreCase("hello world", "WORLD") diff --git a/api/go/metadata/comment.go b/api/go/metadata/comment.go index b1baba8a..e9490f8f 100644 --- a/api/go/metadata/comment.go +++ b/api/go/metadata/comment.go @@ -4,7 +4,7 @@ // package comments. This file should contain all code related to package comments // as specified in api_metadata.md Section 1 and package_file_format.md Section 7. // -// Specification: api_metadata.md: 1. Comment Management +// Specification: api_metadata.md: 1 Comment Management // Package novuspack provides metadata domain structures for the NovusPack implementation. // @@ -44,15 +44,15 @@ func NewPackageComment() *PackageComment { // Specification: package_file_format.md: 7.1 Package Comment Format Specification type PackageComment struct { // CommentLength is the length of comment including null terminator - // Specification: package_file_format.md: 6.1.1 NewFileIndex Function + // Specification: api_metadata.md: 1.2 PackageComment Structure CommentLength uint32 // Comment is the UTF-8 encoded comment string (null-terminated) - // Specification: package_file_format.md: 6.1.1 NewFileIndex Function + // Specification: api_metadata.md: 1.2 PackageComment Structure Comment string // Reserved is reserved for future use (must be 0) - // Specification: package_file_format.md: 6.1.1 NewFileIndex Function + // Specification: api_metadata.md: 1.2 PackageComment Structure Reserved [3]uint8 } @@ -69,32 +69,11 @@ type PackageComment struct { // // Returns an error if any validation check fails. // -// Specification: api_metadata.md: 1.2 PackageComment Structure +// Specification: api_metadata.md: 1.3.4 PackageComment.Validate Method func (p *PackageComment) Validate() error { - // Check if comment exists if p.CommentLength == 0 { - if p.Comment != "" { - return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "comment length is zero but comment is not empty", nil, pkgerrors.ValidationErrorContext{ - Field: "Comment", - Value: p.Comment, - Expected: "empty string", - }) - } - // Empty comment is valid - // Verify reserved bytes are zero - for i, b := range p.Reserved { - if b != 0 { - return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, fmt.Sprintf("reserved byte %d must be zero", i), nil, pkgerrors.ValidationErrorContext{ - Field: "Reserved", - Value: b, - Expected: "0", - }) - } - } - return nil + return p.validateEmptyComment() } - - // CommentLength must not exceed maximum if p.CommentLength > MaxCommentLength { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "comment length exceeds maximum", nil, pkgerrors.ValidationErrorContext{ Field: "CommentLength", @@ -152,31 +131,40 @@ func (p *PackageComment) Validate() error { }) } - // Verify reserved bytes are zero + return p.validateReservedZero() +} + +func (p *PackageComment) validateEmptyComment() error { + if p.Comment != "" { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "comment length mismatch", nil, pkgerrors.ValidationErrorContext{ + Field: "CommentLength", Value: p.CommentLength, Expected: "non-zero when comment present", + }) + } + return p.validateReservedZero() +} + +func (p *PackageComment) validateReservedZero() error { for i, b := range p.Reserved { if b != 0 { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, fmt.Sprintf("reserved byte %d must be zero", i), nil, pkgerrors.ValidationErrorContext{ - Field: "Reserved", - Value: b, - Expected: "0", + Field: "Reserved", Value: b, Expected: "0", }) } } - return nil } // Size returns the total size of the PackageComment in bytes. // -// Specification: package_file_format.md: 6.1 File Index Structure +// Specification: api_metadata.md: 1.3.1 PackageComment.Size Method func (p *PackageComment) Size() int { return 4 + int(p.CommentLength) + 3 // Length(4) + Comment + Reserved(3) } // IsEmpty returns true if the comment is empty (CommentLength == 0). // -// Specification: api_generics.md: 1.3.1 PathEntry Structure -func (p *PackageComment) IsEmpty() bool { +// Specification: api_metadata.md: 1.3 PackageComment Methods +func (p *PackageComment) isEmpty() bool { return p.CommentLength == 0 } @@ -187,8 +175,8 @@ func (p *PackageComment) IsEmpty() bool { // // Returns an error if the comment exceeds MaxCommentLength or contains invalid UTF-8. // -// Specification: api_generics.md: 1.3.1 PathEntry Structure -func (p *PackageComment) SetComment(comment string) error { +// Specification: api_metadata.md: 1.3 PackageComment Methods +func (p *PackageComment) setComment(comment string) error { // Validate UTF-8 before processing if comment != "" && !utf8.ValidString(comment) { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "comment is not valid UTF-8", nil, pkgerrors.ValidationErrorContext{ @@ -245,8 +233,8 @@ func (p *PackageComment) SetComment(comment string) error { // // Returns an empty string if the comment is empty. // -// Specification: package_file_format.md: 1. `.nvpk` File Format Overview -func (p *PackageComment) GetComment() string { +// Specification: api_metadata.md: 1.3 PackageComment Methods +func (p *PackageComment) getComment() string { if p.CommentLength == 0 || p.Comment == "" { return "" } @@ -262,8 +250,8 @@ func (p *PackageComment) GetComment() string { // Clear removes the comment and resets all fields. // -// Specification: api_metadata.md: 1.1 Package-Level Comment Methods -func (p *PackageComment) Clear() { +// Specification: api_metadata.md: 1.3 PackageComment Methods +func (p *PackageComment) clear() { p.CommentLength = 0 p.Comment = "" p.Reserved = [3]uint8{0, 0, 0} @@ -280,7 +268,7 @@ func (p *PackageComment) Clear() { // // Returns the number of bytes read and any error encountered. // -// Specification: package_file_format.md: 6.1.1 NewFileIndex Function +// Specification: api_metadata.md: 1.3.3 PackageComment.ReadFrom Method func (p *PackageComment) ReadFrom(r io.Reader) (int64, error) { var totalRead int64 @@ -359,7 +347,7 @@ func (p *PackageComment) ReadFrom(r io.Reader) (int64, error) { // // Returns the number of bytes written and any error encountered. // -// Specification: package_file_format.md: 6.1.1 NewFileIndex Function +// Specification: api_metadata.md: 1.3.2 PackageComment.WriteTo Method func (p *PackageComment) WriteTo(w io.Writer) (int64, error) { var totalWritten int64 diff --git a/api/go/metadata/comment_constants.go b/api/go/metadata/comment_constants.go index 883fa827..f582255e 100644 --- a/api/go/metadata/comment_constants.go +++ b/api/go/metadata/comment_constants.go @@ -2,7 +2,7 @@ // comment length and validation constants. This file should contain only // constant definitions used for comment validation and processing. // -// Specification: api_metadata.md: 1. Comment Management +// Specification: api_metadata.md: 1 Comment Management // Package novuspack provides metadata domain structures for the NovusPack implementation. // @@ -11,5 +11,5 @@ package metadata // MaxCommentLength is the maximum allowed comment length (1MB - 1) -// Specification: package_file_format.md: 6.1.1 NewFileIndex Function +// Specification: package_file_format.md: 7.1 Package Comment Format Specification const MaxCommentLength = 1048575 diff --git a/api/go/metadata/comment_test.go b/api/go/metadata/comment_test.go index 9facf2a5..2ae3c874 100644 --- a/api/go/metadata/comment_test.go +++ b/api/go/metadata/comment_test.go @@ -61,14 +61,20 @@ func TestPackageCommentValidation(t *testing.T) { }, } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - err := tt.comment.Validate() - if (err != nil) != tt.wantErr { - t.Errorf("Validate() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + runCommentValidateTableTest(t, tests) +} + +func runCommentValidateTableTest(t *testing.T, tests []struct { + name string + comment PackageComment + wantErr bool +}) { + t.Helper() + cases := make([]validateCase, len(tests)) + for i := range tests { + cases[i] = validateCase{name: tests[i].name, subject: &tests[i].comment, wantErr: tests[i].wantErr} + } + runValidateTable(t, cases) } // TestPackageCommentSizeCalculation verifies size calculation @@ -109,14 +115,16 @@ func TestPackageCommentIsEmpty(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - if tt.comment.IsEmpty() != tt.wantEmpty { - t.Errorf("IsEmpty() = %v, want %v", tt.comment.IsEmpty(), tt.wantEmpty) + if tt.comment.isEmpty() != tt.wantEmpty { + t.Errorf("isEmpty() = %v, want %v", tt.comment.isEmpty(), tt.wantEmpty) } }) } } // TestPackageCommentSetComment verifies SetComment function +// +//nolint:gocognit // table-driven set/comment cases func TestPackageCommentSetComment(t *testing.T) { tests := []struct { name string @@ -139,7 +147,7 @@ func TestPackageCommentSetComment(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { var pc PackageComment - err := pc.SetComment(tt.comment) + err := pc.setComment(tt.comment) if (err != nil) != tt.wantErr { t.Errorf("SetComment() error = %v, wantErr %v", err, tt.wantErr) @@ -183,7 +191,7 @@ func TestPackageCommentGetComment(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - got := tt.comment.GetComment() + got := tt.comment.getComment() if got != tt.wantComment { t.Errorf("GetComment() = %q, want %q", got, tt.wantComment) } @@ -214,7 +222,7 @@ func TestNewPackageComment(t *testing.T) { } // Verify it's equivalent to an empty comment state - if !pc.IsEmpty() { + if !pc.isEmpty() { t.Errorf("IsEmpty() = false, want true for new PackageComment") } @@ -232,7 +240,7 @@ func TestPackageCommentClear(t *testing.T) { Reserved: [3]uint8{1, 2, 3}, } - pc.Clear() + pc.clear() if pc.CommentLength != 0 { t.Errorf("CommentLength = %d, want 0", pc.CommentLength) @@ -266,7 +274,7 @@ func TestPackageCommentReadFrom(t *testing.T) { { "Simple comment", []byte{ - 0x05, 0x00, 0x00, 0x00, // CommentLength = 5 + 0x05, 0x00, 0x00, 0x00, 0x74, 0x65, 0x73, 0x74, 0x00, // "test\x00" 0x00, 0x00, 0x00, // Reserved }, @@ -277,7 +285,7 @@ func TestPackageCommentReadFrom(t *testing.T) { { "Comment with newline", []byte{ - 0x0D, 0x00, 0x00, 0x00, // CommentLength = 13 + 0x0D, 0x00, 0x00, 0x00, 0x74, 0x65, 0x73, 0x74, 0x0A, 0x63, 0x6F, 0x6D, 0x6D, 0x65, 0x6E, 0x74, 0x00, // "test\ncomment\x00" 0x00, 0x00, 0x00, // Reserved }, @@ -310,7 +318,7 @@ func TestPackageCommentReadFrom(t *testing.T) { t.Errorf("CommentLength = %d, want %d", pc.CommentLength, tt.wantLen) } - gotComment := pc.GetComment() + gotComment := pc.getComment() if gotComment != tt.wantComment { t.Errorf("GetComment() = %q, want %q", gotComment, tt.wantComment) } @@ -320,6 +328,8 @@ func TestPackageCommentReadFrom(t *testing.T) { } // TestPackageCommentWriteTo verifies WriteTo function +// +//nolint:gocognit // table-driven write cases func TestPackageCommentWriteTo(t *testing.T) { tests := []struct { name string @@ -373,6 +383,8 @@ func TestPackageCommentWriteTo(t *testing.T) { } // TestPackageCommentRoundTrip verifies round-trip serialization +// +//nolint:gocognit // table-driven round-trip func TestPackageCommentRoundTrip(t *testing.T) { tests := []struct { name string @@ -389,7 +401,7 @@ func TestPackageCommentRoundTrip(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { var pc1 PackageComment - if err := pc1.SetComment(tt.comment); err != nil { + if err := pc1.setComment(tt.comment); err != nil { t.Fatalf("SetComment() error = %v", err) } @@ -407,8 +419,8 @@ func TestPackageCommentRoundTrip(t *testing.T) { t.Errorf("CommentLength mismatch: %d != %d", pc1.CommentLength, pc2.CommentLength) } - if pc1.GetComment() != pc2.GetComment() { - t.Errorf("Comment mismatch: %q != %q", pc1.GetComment(), pc2.GetComment()) + if pc1.getComment() != pc2.getComment() { + t.Errorf("Comment mismatch: %q != %q", pc1.getComment(), pc2.getComment()) } // Validate the read comment @@ -585,18 +597,11 @@ func TestPackageCommentReadFromIncompleteData(t *testing.T) { {"Incomplete reserved", []byte{0x05, 0x00, 0x00, 0x00, 0x74, 0x65, 0x73, 0x74, 0x00, 0x00}}, {"Incomplete reserved bytes read", []byte{0x05, 0x00, 0x00, 0x00, 0x74, 0x65, 0x73, 0x74, 0x00, 0x00, 0x00}}, } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - var pc PackageComment - r := bytes.NewReader(tt.data) - _, err := pc.ReadFrom(r) - - if err == nil { - t.Errorf("ReadFrom() expected error for incomplete data, got nil") - } - }) - } + runReadFromIncompleteExpectError(t, tests, func(data []byte) error { + var pc PackageComment + _, err := pc.ReadFrom(bytes.NewReader(data)) + return err + }) } // TestPackageCommentReadFromIncompleteReservedBytes tests incomplete reserved bytes read diff --git a/api/go/metadata/entry_io_helpers_test.go b/api/go/metadata/entry_io_helpers_test.go new file mode 100644 index 00000000..4deb0123 --- /dev/null +++ b/api/go/metadata/entry_io_helpers_test.go @@ -0,0 +1,88 @@ +// Shared table runners for HashEntry and OptionalDataEntry WriteTo/ReadFrom tests. + +package metadata + +import ( + "bytes" + "io" + "testing" +) + +type writeToEntry interface { + writeTo(io.Writer) (int64, error) +} + +type readFromEntry interface { + readFrom(io.Reader) (int64, error) +} + +type writeToCase struct { + name string + entry writeToEntry + wantErr bool + minSize int64 +} + +func runWriteToEntryTable(t *testing.T, tests []writeToCase) { + t.Helper() + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + var buf bytes.Buffer + n, err := tt.entry.writeTo(&buf) + if (err != nil) != tt.wantErr { + t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) + return + } + if tt.wantErr { + return + } + if n == 0 { + t.Error("WriteTo() wrote 0 bytes") + return + } + if tt.minSize > 0 && n < tt.minSize { + t.Errorf("WriteTo() wrote %d bytes, want at least %d", n, tt.minSize) + } + }) + } +} + +type writeToErrorCase struct { + name string + writer io.Writer + wantErr bool +} + +func runWriteToErrorPathsTable(t *testing.T, entry writeToEntry, tests []writeToErrorCase) { + t.Helper() + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + _, err := entry.writeTo(tt.writer) + if (err != nil) != tt.wantErr { + t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) + return + } + if tt.wantErr && err == nil { + t.Error("WriteTo() expected error but got nil") + } + }) + } +} + +type readFromIncompleteCase struct { + name string + data []byte +} + +func runReadFromIncompleteTable(t *testing.T, tests []readFromIncompleteCase, newEntry func() readFromEntry) { + t.Helper() + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + entry := newEntry() + _, err := entry.readFrom(bytes.NewReader(tt.data)) + if err == nil { + t.Errorf("ReadFrom() expected error for incomplete data, got nil") + } + }) + } +} diff --git a/api/go/metadata/fileentry.go b/api/go/metadata/fileentry.go index 27959c9e..935745b7 100644 --- a/api/go/metadata/fileentry.go +++ b/api/go/metadata/fileentry.go @@ -137,6 +137,47 @@ type FileEntry struct { PathMetadataEntries map[string]*PathMetadataEntry // Path -> PathMetadataEntry mapping } +// FileEntryFixed is the 64-byte fixed section used for binary read/write. +// Specification: package_file_format.md: 4.1 FileEntry Binary Format Specification +type FileEntryFixed struct { + FileID uint64 + OriginalSize uint64 + StoredSize uint64 + RawChecksum uint32 + StoredChecksum uint32 + FileVersion uint32 + MetadataVersion uint32 + PathCount uint16 + Type uint16 + CompressionType uint8 + CompressionLevel uint8 + EncryptionType uint8 + HashCount uint8 + HashDataOffset uint32 + HashDataLen uint16 + OptionalDataLen uint16 + OptionalDataOffset uint32 + Reserved uint32 +} + +// skipReaderToOffset skips bytes in r from currentOffset to targetOffset when targetOffset > currentOffset. +// Returns bytes skipped and nil, or 0 and a wrapped error on failure. +func skipReaderToOffset(r io.Reader, _, currentOffset, targetOffset int64, fieldName string, fieldValue uint32) (int64, error) { + if targetOffset <= 0 || targetOffset <= currentOffset { + return 0, nil + } + skip := targetOffset - currentOffset + _, err := io.CopyN(io.Discard, r, skip) + if err != nil { + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to skip to "+fieldName, pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: fieldValue, + Expected: "skip successful", + }) + } + return skip, nil +} + // Validate performs validation checks on the FileEntry. // // Validation checks: @@ -195,7 +236,7 @@ func (f *FileEntry) Validate() error { // Validate all hashes for i, hash := range f.Hashes { - if err := hash.Validate(); err != nil { + if err := hash.validate(); err != nil { return pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeValidation, fmt.Sprintf("invalid hash at index %d", i), pkgerrors.ValidationErrorContext{ Field: "Hashes", Value: i, @@ -206,7 +247,7 @@ func (f *FileEntry) Validate() error { // Validate all optional data for i, opt := range f.OptionalData { - if err := opt.Validate(); err != nil { + if err := opt.validate(); err != nil { return pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeValidation, fmt.Sprintf("invalid optional data at index %d", i), pkgerrors.ValidationErrorContext{ Field: "OptionalData", Value: i, @@ -230,8 +271,8 @@ func (f *FileEntry) FixedSize() int { // Specification: package_file_format.md: 4.1.4 Variable-Length Data (follows fixed structure) func (f *FileEntry) VariableSize() int { pathSize := lo.SumBy(f.Paths, func(p generics.PathEntry) int { return p.Size() }) - hashSize := lo.SumBy(f.Hashes, func(h HashEntry) int { return h.Size() }) - optSize := lo.SumBy(f.OptionalData, func(o OptionalDataEntry) int { return o.Size() }) + hashSize := lo.SumBy(f.Hashes, func(h HashEntry) int { return h.size() }) + optSize := lo.SumBy(f.OptionalData, func(o OptionalDataEntry) int { return o.size() }) return pathSize + hashSize + optSize } @@ -263,53 +304,19 @@ func NewFileEntry() *FileEntry { // - Hash entries (HashCount entries, starting at HashDataOffset) // - Optional data entries (starting at OptionalDataOffset) // -// Returns the number of bytes read and any error encountered. -// -// Specification: package_file_format.md: 4.1 FileEntry Binary Format Specification -func (f *FileEntry) ReadFrom(r io.Reader) (int64, error) { - var totalRead int64 - - // Read fixed section (64 bytes) - // Create a temporary struct to read the fixed fields - type FileEntryFixed struct { - FileID uint64 - OriginalSize uint64 - StoredSize uint64 - RawChecksum uint32 - StoredChecksum uint32 - FileVersion uint32 - MetadataVersion uint32 - PathCount uint16 - Type uint16 - CompressionType uint8 - CompressionLevel uint8 - EncryptionType uint8 - HashCount uint8 - HashDataOffset uint32 - HashDataLen uint16 - OptionalDataLen uint16 - OptionalDataOffset uint32 - Reserved uint32 - } - +// readFileEntryFixed reads the 64-byte fixed section, copies to f, and initializes slices; returns bytes read and error. +func readFileEntryFixed(r io.Reader, f *FileEntry) (int64, error) { var fixed FileEntryFixed if err := binary.Read(r, binary.LittleEndian, &fixed); err != nil { if err == io.EOF || err == io.ErrUnexpectedEOF { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, fmt.Sprintf("failed to read fixed section: incomplete data (read %d bytes, expected %d)", totalRead, FileEntryFixedSize), pkgerrors.ValidationErrorContext{ - Field: "FixedSection", - Value: totalRead, - Expected: fmt.Sprintf("%d bytes", FileEntryFixedSize), + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, fmt.Sprintf("failed to read fixed section: incomplete data (expected %d bytes)", FileEntryFixedSize), pkgerrors.ValidationErrorContext{ + Field: "FixedSection", Value: int64(0), Expected: fmt.Sprintf("%d bytes", FileEntryFixedSize), }) } - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read fixed section", pkgerrors.ValidationErrorContext{ - Field: "FixedSection", - Value: nil, - Expected: fmt.Sprintf("%d bytes", FileEntryFixedSize), + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read fixed section", pkgerrors.ValidationErrorContext{ + Field: "FixedSection", Value: nil, Expected: fmt.Sprintf("%d bytes", FileEntryFixedSize), }) } - totalRead += FileEntryFixedSize - - // Copy fixed fields to FileEntry f.FileID = fixed.FileID f.OriginalSize = fixed.OriginalSize f.StoredSize = fixed.StoredSize @@ -328,103 +335,96 @@ func (f *FileEntry) ReadFrom(r io.Reader) (int64, error) { f.OptionalDataLen = fixed.OptionalDataLen f.OptionalDataOffset = fixed.OptionalDataOffset f.Reserved = fixed.Reserved - - // Initialize slices f.Paths = make([]generics.PathEntry, 0, f.PathCount) f.Hashes = make([]HashEntry, 0, f.HashCount) f.OptionalData = make([]OptionalDataEntry, 0) + return FileEntryFixedSize, nil +} - // Read path entries (starting at offset 0) +// readFileEntryPaths reads PathCount path entries into f.Paths; returns total bytes read (including existing totalRead) and error. +func readFileEntryPaths(r io.Reader, f *FileEntry, totalRead int64) (int64, error) { for i := uint16(0); i < f.PathCount; i++ { var path generics.PathEntry n, err := path.ReadFrom(r) if err != nil { return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to read path entry %d", i), pkgerrors.ValidationErrorContext{ - Field: "Paths", - Value: i, - Expected: "valid path entry", + Field: "Paths", Value: i, Expected: "valid path entry", }) } totalRead += n f.Paths = append(f.Paths, path) } + return totalRead, nil +} - // Read hash entries (starting at HashDataOffset) - // Calculate how many bytes we've read so far (paths) - pathsSize := int64(lo.SumBy(f.Paths, func(p generics.PathEntry) int { return p.Size() })) - - // If HashDataOffset is set, we may need to skip some bytes - if f.HashDataOffset > 0 && int64(f.HashDataOffset) > pathsSize { - skip := int64(f.HashDataOffset) - pathsSize - _, err := io.CopyN(io.Discard, r, skip) - if err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to skip to hash data offset", pkgerrors.ValidationErrorContext{ - Field: "HashDataOffset", - Value: f.HashDataOffset, - Expected: "skip successful", - }) - } - totalRead += skip +// readFileEntryHashes skips to HashDataOffset if needed, then reads HashCount hash entries; returns total bytes read and error. +func readFileEntryHashes(r io.Reader, f *FileEntry, totalRead, pathsSize int64) (int64, error) { + n, err := skipReaderToOffset(r, totalRead, pathsSize, int64(f.HashDataOffset), "HashDataOffset", f.HashDataOffset) + if err != nil { + return totalRead, err } - - // Read hash entries + totalRead += n for i := uint8(0); i < f.HashCount; i++ { var hash HashEntry - n, err := hash.ReadFrom(r) + n, err := hash.readFrom(r) if err != nil { return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to read hash entry %d", i), pkgerrors.ValidationErrorContext{ - Field: "Hashes", - Value: i, - Expected: "valid hash entry", + Field: "Hashes", Value: i, Expected: "valid hash entry", }) } totalRead += n f.Hashes = append(f.Hashes, hash) } + return totalRead, nil +} - // Read optional data entries (starting at OptionalDataOffset) - // Calculate current position after paths and hashes - hashSize := int64(lo.SumBy(f.Hashes, func(h HashEntry) int { return h.Size() })) - currentOffset := pathsSize + hashSize - - if f.OptionalDataOffset > 0 && int64(f.OptionalDataOffset) > currentOffset { - skip := int64(f.OptionalDataOffset) - currentOffset - _, err := io.CopyN(io.Discard, r, skip) - if err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to skip to optional data offset", pkgerrors.ValidationErrorContext{ - Field: "OptionalDataOffset", - Value: f.OptionalDataOffset, - Expected: "skip successful", - }) - } - totalRead += skip +// readFileEntryOptionalData skips to OptionalDataOffset if needed, then reads optional data entries until OptionalDataLen bytes consumed. +func readFileEntryOptionalData(r io.Reader, f *FileEntry, totalRead, currentOffset int64) (int64, error) { + n, err := skipReaderToOffset(r, totalRead, currentOffset, int64(f.OptionalDataOffset), "OptionalDataOffset", f.OptionalDataOffset) + if err != nil { + return totalRead, err } - - // Read optional data entries - // We need to read until we've consumed OptionalDataLen bytes + totalRead += n optionalDataRead := int64(0) for optionalDataRead < int64(f.OptionalDataLen) { var opt OptionalDataEntry - n, err := opt.ReadFrom(r) + n, err := opt.readFrom(r) if err != nil { if err == io.EOF && optionalDataRead > 0 { - // We've read some optional data, this might be acceptable break } return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read optional data entry", pkgerrors.ValidationErrorContext{ - Field: "OptionalData", - Value: optionalDataRead, - Expected: fmt.Sprintf("%d bytes", f.OptionalDataLen), + Field: "OptionalData", Value: optionalDataRead, Expected: fmt.Sprintf("%d bytes", f.OptionalDataLen), }) } totalRead += n optionalDataRead += n f.OptionalData = append(f.OptionalData, opt) } - return totalRead, nil } +// Returns the number of bytes read and any error encountered. +// +// Specification: package_file_format.md: 4.1 FileEntry Binary Format Specification +func (f *FileEntry) ReadFrom(r io.Reader) (int64, error) { + totalRead, err := readFileEntryFixed(r, f) + if err != nil { + return totalRead, err + } + totalRead, err = readFileEntryPaths(r, f, totalRead) + if err != nil { + return totalRead, err + } + pathsSize := int64(lo.SumBy(f.Paths, func(p generics.PathEntry) int { return p.Size() })) + totalRead, err = readFileEntryHashes(r, f, totalRead, pathsSize) + if err != nil { + return totalRead, err + } + hashSize := int64(lo.SumBy(f.Hashes, func(h HashEntry) int { return h.size() })) + return readFileEntryOptionalData(r, f, totalRead, pathsSize+hashSize) +} + // WriteTo writes both metadata and data to a writer. // // Writes both metadata and data to a writer. diff --git a/api/go/metadata/fileentry_data.go b/api/go/metadata/fileentry_data.go index fd196c76..63b236bf 100644 --- a/api/go/metadata/fileentry_data.go +++ b/api/go/metadata/fileentry_data.go @@ -2,9 +2,9 @@ // It contains methods for loading, unloading, and managing file data in memory // and temporary files. This file should contain data management methods // (LoadData, UnloadData, GetData, SetData, temp file operations) as specified -// in api_file_mgmt_file_entry.md Section 1.4. +// in api_file_mgmt_file_entry.md Section 4. // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4 Data Management package metadata @@ -28,10 +28,10 @@ import ( // // Returns *PackageError on failure. // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.1.1 FileEntry LoadData Method func (f *FileEntry) LoadData(ctx context.Context) error { // Check if already loaded - if f.IsDataLoaded && len(f.Data) > 0 { + if f.IsDataLoaded { return nil } @@ -100,7 +100,7 @@ func (f *FileEntry) LoadData(ctx context.Context) error { // Clears file content from memory. // Releases memory resources. // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.1.2 FileEntry.UnloadData Method func (f *FileEntry) UnloadData() { f.Data = nil f.IsDataLoaded = false @@ -116,10 +116,10 @@ func (f *FileEntry) UnloadData() { // - []byte: File content // - error: *PackageError on failure // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.1.3 FileEntry.GetData Method func (f *FileEntry) GetData() ([]byte, error) { // If data is already loaded, return it - if f.IsDataLoaded && len(f.Data) > 0 { + if f.IsDataLoaded { result := make([]byte, len(f.Data)) copy(result, f.Data) return result, nil @@ -150,15 +150,15 @@ func (f *FileEntry) GetData() ([]byte, error) { // Parameters: // - data: File content to set // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.1.4 FileEntry.SetData Method func (f *FileEntry) SetData(data []byte) { - f.Data = data - f.IsDataLoaded = len(data) > 0 - if f.IsDataLoaded { - f.ProcessingState = ProcessingStateComplete - } else { - f.ProcessingState = ProcessingStateIdle + if data == nil { + data = []byte{} } + + f.Data = data + f.IsDataLoaded = true + f.ProcessingState = ProcessingStateComplete } // GetProcessingState returns the current processing state. @@ -166,7 +166,7 @@ func (f *FileEntry) SetData(data []byte) { // Returns: // - ProcessingState: Current processing state // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.3.1 FileEntry.GetProcessingState Method func (f *FileEntry) GetProcessingState() ProcessingState { return f.ProcessingState } @@ -176,7 +176,7 @@ func (f *FileEntry) GetProcessingState() ProcessingState { // Parameters: // - state: Processing state to set // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.3.2 FileEntry.SetProcessingState Method func (f *FileEntry) SetProcessingState(state ProcessingState) { f.ProcessingState = state } @@ -188,8 +188,8 @@ func (f *FileEntry) SetProcessingState(state ProcessingState) { // - offset: Offset in source file // - size: Size of data to read from source // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions -func (f *FileEntry) SetSourceFile(file *os.File, offset, size int64) { +// Specification: api_file_mgmt_file_entry.md: 4.4 Source Tracking (CurrentSource/OriginalSource) +func (f *FileEntry) setSourceFile(file *os.File, offset, size int64) { f.SourceFile = file f.SourceOffset = offset f.SourceSize = size @@ -202,8 +202,8 @@ func (f *FileEntry) SetSourceFile(file *os.File, offset, size int64) { // - int64: Offset in source file // - int64: Size of data to read from source // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions -func (f *FileEntry) GetSourceFile() (*os.File, int64, int64) { +// Specification: api_file_mgmt_file_entry.md: 4.4 Source Tracking (CurrentSource/OriginalSource) +func (f *FileEntry) getSourceFile() (file *os.File, offset, size int64) { return f.SourceFile, f.SourceOffset, f.SourceSize } @@ -212,8 +212,8 @@ func (f *FileEntry) GetSourceFile() (*os.File, int64, int64) { // Parameters: // - path: Path to temporary file // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions -func (f *FileEntry) SetTempPath(path string) { +// Specification: api_file_mgmt_file_entry.md: 4.2 Temporary File Operations +func (f *FileEntry) setTempPath(path string) { f.TempFilePath = path f.IsTempFile = path != "" } @@ -223,8 +223,8 @@ func (f *FileEntry) SetTempPath(path string) { // Returns: // - string: Path to temporary file (empty if not set) // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions -func (f *FileEntry) GetTempPath() string { +// Specification: api_file_mgmt_file_entry.md: 4.2 Temporary File Operations +func (f *FileEntry) getTempPath() string { return f.TempFilePath } @@ -238,7 +238,7 @@ func (f *FileEntry) GetTempPath() string { // // Returns *PackageError on failure. // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.2.1 FileEntry.CreateTempFile Method func (f *FileEntry) CreateTempFile(ctx context.Context) error { if ctx != nil { select { @@ -278,7 +278,7 @@ func (f *FileEntry) CreateTempFile(ctx context.Context) error { // // Returns *PackageError on failure. // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.2.2 FileEntry.StreamToTempFile Method func (f *FileEntry) StreamToTempFile(ctx context.Context) error { if ctx != nil { select { @@ -307,7 +307,7 @@ func (f *FileEntry) StreamToTempFile(ctx context.Context) error { } } - tmpFile, err := os.OpenFile(f.TempFilePath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0600) + tmpFile, err := os.OpenFile(f.TempFilePath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) if err != nil { return pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to open temporary file for writing", pkgerrors.ValidationErrorContext{ Field: "TempFilePath", @@ -355,7 +355,7 @@ func (f *FileEntry) StreamToTempFile(ctx context.Context) error { // // Returns *PackageError on failure. // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.2.3 FileEntry.WriteToTempFile Method func (f *FileEntry) WriteToTempFile(ctx context.Context, data []byte) error { if ctx != nil { select { @@ -376,7 +376,7 @@ func (f *FileEntry) WriteToTempFile(ctx context.Context, data []byte) error { } } - tmpFile, err := os.OpenFile(f.TempFilePath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0600) + tmpFile, err := os.OpenFile(f.TempFilePath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600) if err != nil { return pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to open temporary file for writing", pkgerrors.ValidationErrorContext{ Field: "TempFilePath", @@ -416,7 +416,7 @@ func (f *FileEntry) WriteToTempFile(ctx context.Context, data []byte) error { // - []byte: Data read from temporary file // - error: *PackageError on failure // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.2.4 FileEntry.ReadFromTempFile Method func (f *FileEntry) ReadFromTempFile(ctx context.Context, offset, size int64) ([]byte, error) { if ctx != nil { select { @@ -480,7 +480,7 @@ func (f *FileEntry) ReadFromTempFile(ctx context.Context, offset, size int64) ([ // // Returns *PackageError on failure. // -// Specification: api_file_mgmt_file_entry.md: 1.4 Helper Functions +// Specification: api_file_mgmt_file_entry.md: 4.2.5 FileEntry.CleanupTempFile Method func (f *FileEntry) CleanupTempFile(ctx context.Context) error { if ctx != nil { select { diff --git a/api/go/metadata/fileentry_data_test.go b/api/go/metadata/fileentry_data_test.go index fc43a71a..21eb9cd5 100644 --- a/api/go/metadata/fileentry_data_test.go +++ b/api/go/metadata/fileentry_data_test.go @@ -1,12 +1,15 @@ package metadata import ( + "bytes" "context" "os" "testing" ) // TestLoadData tests LoadData method +// +//nolint:gocognit,gocyclo // table-driven load cases func TestLoadData(t *testing.T) { // Create a temporary source file sourceFile, err := os.CreateTemp("", "novuspack-source-*") @@ -29,10 +32,8 @@ func TestLoadData(t *testing.T) { wantErr bool }{ { - name: "no source file", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "no source file", + setup: NewFileEntry, wantErr: true, }, { @@ -48,7 +49,7 @@ func TestLoadData(t *testing.T) { name: "successful load from source", setup: func() *FileEntry { fe := NewFileEntry() - fe.SetSourceFile(sourceFile, 0, int64(len(testData))) + fe.setSourceFile(sourceFile, 0, int64(len(testData))) return fe }, wantErr: false, @@ -57,7 +58,7 @@ func TestLoadData(t *testing.T) { name: "cancelled context", setup: func() *FileEntry { fe := NewFileEntry() - fe.SetSourceFile(sourceFile, 0, int64(len(testData))) + fe.setSourceFile(sourceFile, 0, int64(len(testData))) return fe }, wantErr: true, @@ -100,7 +101,7 @@ func TestLoadData(t *testing.T) { t.Run("incomplete read", func(t *testing.T) { fe := NewFileEntry() // Set source size larger than actual file content - fe.SetSourceFile(sourceFile, 0, int64(len(testData)+10)) + fe.setSourceFile(sourceFile, 0, int64(len(testData)+10)) err := fe.LoadData(context.Background()) if err == nil { t.Error("LoadData() with incomplete read should return error") @@ -119,7 +120,7 @@ func TestLoadData(t *testing.T) { t.Fatalf("Failed to open /dev/null: %v", err) } defer func() { _ = badFile.Close() }() - fe.SetSourceFile(badFile, 0, 100) + fe.setSourceFile(badFile, 0, 100) err = fe.LoadData(context.Background()) if err == nil { t.Error("LoadData() with read error should return error") @@ -152,6 +153,8 @@ func TestUnloadData(t *testing.T) { } // TestGetData tests GetData method +// +//nolint:gocognit // table-driven get cases func TestGetData(t *testing.T) { // Create a temporary source file sourceFile, err := os.CreateTemp("", "novuspack-source-*") @@ -183,17 +186,15 @@ func TestGetData(t *testing.T) { wantErr: false, }, { - name: "no data available", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "no data available", + setup: NewFileEntry, wantErr: true, }, { name: "load from source file", setup: func() *FileEntry { fe := NewFileEntry() - fe.SetSourceFile(sourceFile, 0, int64(len(testData))) + fe.setSourceFile(sourceFile, 0, int64(len(testData))) return fe }, wantErr: false, @@ -233,14 +234,14 @@ func TestGetData(t *testing.T) { t.Error("GetData() returned empty data") } - if tt.name == "load from source file" && string(data) != string(testData) { + if tt.name == "load from source file" && !bytes.Equal(data, testData) { t.Errorf("GetData() data = %q, want %q", string(data), string(testData)) } } // Cleanup temp file if created if tt.name == "load from temp file" { - _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test } }) } @@ -261,12 +262,12 @@ func TestSetData(t *testing.T) { { name: "empty data", data: []byte{}, - wantState: ProcessingStateIdle, + wantState: ProcessingStateComplete, }, { name: "nil data", data: nil, - wantState: ProcessingStateIdle, + wantState: ProcessingStateComplete, }, } @@ -274,25 +275,26 @@ func TestSetData(t *testing.T) { t.Run(tt.name, func(t *testing.T) { fe := NewFileEntry() fe.SetData(tt.data) + checkSetDataResult(t, fe, tt.data, tt.wantState) + }) + } +} - if len(tt.data) > 0 { - if !fe.IsDataLoaded { - t.Error("SetData() did not set IsDataLoaded flag") - } - - if len(fe.Data) != len(tt.data) { - t.Errorf("SetData() Data length = %d, want %d", len(fe.Data), len(tt.data)) - } - } else { - if fe.IsDataLoaded { - t.Error("SetData() with empty data should not set IsDataLoaded flag") - } - } +// checkSetDataResult verifies FileEntry state after SetData; used to reduce TestSetData complexity. +func checkSetDataResult(t *testing.T, fe *FileEntry, data []byte, wantState ProcessingState) { + t.Helper() + if !fe.IsDataLoaded { + t.Error("SetData() did not set IsDataLoaded flag") + } - if fe.ProcessingState != tt.wantState { - t.Errorf("SetData() ProcessingState = %v, want %v", fe.ProcessingState, tt.wantState) - } - }) + if data == nil { + data = []byte{} + } + if len(fe.Data) != len(data) { + t.Errorf("SetData() Data length = %d, want %d", len(fe.Data), len(data)) + } + if fe.ProcessingState != wantState { + t.Errorf("SetData() ProcessingState = %v, want %v", fe.ProcessingState, wantState) } } @@ -320,79 +322,79 @@ func TestSetProcessingState(t *testing.T) { } } -// TestSetSourceFile tests SetSourceFile method +// TestSetSourceFile tests the internal setSourceFile helper. func TestSetSourceFile(t *testing.T) { fe := NewFileEntry() testFile, _ := os.Open(os.DevNull) defer func() { _ = testFile.Close() }() //nolint:errcheck // Close on exit - error is non-critical - fe.SetSourceFile(testFile, 10, 100) + fe.setSourceFile(testFile, 10, 100) if fe.SourceFile != testFile { - t.Error("SetSourceFile() did not set SourceFile") + t.Error("setSourceFile() did not set SourceFile") } if fe.SourceOffset != 10 { - t.Errorf("SetSourceFile() SourceOffset = %d, want 10", fe.SourceOffset) + t.Errorf("setSourceFile() SourceOffset = %d, want 10", fe.SourceOffset) } if fe.SourceSize != 100 { - t.Errorf("SetSourceFile() SourceSize = %d, want 100", fe.SourceSize) + t.Errorf("setSourceFile() SourceSize = %d, want 100", fe.SourceSize) } } -// TestGetSourceFile tests GetSourceFile method +// TestGetSourceFile tests the internal getSourceFile helper. func TestGetSourceFile(t *testing.T) { fe := NewFileEntry() testFile, _ := os.Open(os.DevNull) defer func() { _ = testFile.Close() }() //nolint:errcheck // Close on exit - error is non-critical - fe.SetSourceFile(testFile, 10, 100) + fe.setSourceFile(testFile, 10, 100) - file, offset, size := fe.GetSourceFile() + file, offset, size := fe.getSourceFile() if file != testFile { - t.Error("GetSourceFile() returned wrong file") + t.Error("getSourceFile() returned wrong file") } if offset != 10 { - t.Errorf("GetSourceFile() offset = %d, want 10", offset) + t.Errorf("getSourceFile() offset = %d, want 10", offset) } if size != 100 { - t.Errorf("GetSourceFile() size = %d, want 100", size) + t.Errorf("getSourceFile() size = %d, want 100", size) } } -// TestSetTempPath tests SetTempPath method +// TestSetTempPath tests the internal setTempPath helper. func TestSetTempPath(t *testing.T) { fe := NewFileEntry() testPath := "/tmp/test" - fe.SetTempPath(testPath) + fe.setTempPath(testPath) if fe.TempFilePath != testPath { - t.Errorf("SetTempPath() TempFilePath = %q, want %q", fe.TempFilePath, testPath) + t.Errorf("setTempPath() TempFilePath = %q, want %q", fe.TempFilePath, testPath) } if !fe.IsTempFile { - t.Error("SetTempPath() did not set IsTempFile flag") + t.Error("setTempPath() did not set IsTempFile flag") } - fe.SetTempPath("") + fe.setTempPath("") if fe.IsTempFile { - t.Error("SetTempPath() with empty path did not clear IsTempFile flag") + t.Error("setTempPath() with empty path did not clear IsTempFile flag") } } -// TestGetTempPath tests GetTempPath method +// TestGetTempPath tests the internal getTempPath helper. func TestGetTempPath(t *testing.T) { fe := NewFileEntry() testPath := "/tmp/test" - fe.SetTempPath(testPath) + fe.setTempPath(testPath) - if fe.GetTempPath() != testPath { - t.Errorf("GetTempPath() = %q, want %q", fe.GetTempPath(), testPath) + if fe.getTempPath() != testPath { + t.Errorf("getTempPath() = %q, want %q", fe.getTempPath(), testPath) } } @@ -446,6 +448,8 @@ func TestCreateTempFile(t *testing.T) { } // TestCleanupTempFile tests CleanupTempFile method +// +//nolint:gocognit // table-driven cleanup cases func TestCleanupTempFile(t *testing.T) { tests := []struct { name string @@ -465,10 +469,8 @@ func TestCleanupTempFile(t *testing.T) { wantErr: false, }, { - name: "cleanup with no temp file", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "cleanup with no temp file", + setup: NewFileEntry, wantErr: false, // Should not error if no temp file }, { @@ -540,7 +542,7 @@ func TestStreamToTempFile(t *testing.T) { } fe := NewFileEntry() - fe.SetSourceFile(sourceFile, 0, int64(len(testData))) + fe.setSourceFile(sourceFile, 0, int64(len(testData))) // Test successful streaming err = fe.StreamToTempFile(context.Background()) @@ -566,25 +568,21 @@ func TestStreamToTempFile(t *testing.T) { t.Fatalf("Failed to read temp file: %v", err) } - if string(readData) != string(testData) { + if !bytes.Equal(readData, testData) { t.Errorf("StreamToTempFile() data = %q, want %q", string(readData), string(testData)) } // Cleanup - _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test // Test with no source file - fe2 := NewFileEntry() - err = fe2.StreamToTempFile(context.Background()) - if err == nil { - t.Error("StreamToTempFile() with no source file should return error") - } + testStreamToTempFileNoSource(t) // Test with cancelled context ctx, cancel := context.WithCancel(context.Background()) cancel() fe3 := NewFileEntry() - fe3.SetSourceFile(sourceFile, 0, int64(len(testData))) + fe3.setSourceFile(sourceFile, 0, int64(len(testData))) err = fe3.StreamToTempFile(ctx) if err == nil { t.Error("StreamToTempFile() with cancelled context should return error") @@ -597,7 +595,7 @@ func TestStreamToTempFile(t *testing.T) { t.Fatalf("CreateTempFile() error = %v", err) } tempPath := fe4.TempFilePath - fe4.SetSourceFile(sourceFile, 0, int64(len(testData))) + fe4.setSourceFile(sourceFile, 0, int64(len(testData))) err = fe4.StreamToTempFile(context.Background()) if err != nil { t.Fatalf("StreamToTempFile() with existing temp file error = %v", err) @@ -605,29 +603,47 @@ func TestStreamToTempFile(t *testing.T) { if fe4.TempFilePath != tempPath { t.Error("StreamToTempFile() should reuse existing temp file path") } - _ = fe4.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe4.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test // Test seek error fe5 := NewFileEntry() // Use a closed file to cause seek error closedFile, _ := os.Open(os.DevNull) _ = closedFile.Close() - fe5.SetSourceFile(closedFile, 0, 10) + fe5.setSourceFile(closedFile, 0, 10) err = fe5.StreamToTempFile(context.Background()) if err == nil { t.Error("StreamToTempFile() with seek error should return error") } // Test copy error (incomplete copy) - fe6 := NewFileEntry() - fe6.SetSourceFile(sourceFile, 0, int64(len(testData)+100)) // Request more than available - err = fe6.StreamToTempFile(context.Background()) + testStreamToTempFileIncompleteCopy(t, sourceFile, testData) +} + +// testStreamToTempFileNoSource verifies StreamToTempFile returns error when no source file is set. +func testStreamToTempFileNoSource(t *testing.T) { + t.Helper() + fe := NewFileEntry() + err := fe.StreamToTempFile(context.Background()) + if err == nil { + t.Error("StreamToTempFile() with no source file should return error") + } +} + +// testStreamToTempFileIncompleteCopy verifies StreamToTempFile returns error when copy is incomplete. +func testStreamToTempFileIncompleteCopy(t *testing.T, sourceFile *os.File, testData []byte) { + t.Helper() + fe := NewFileEntry() + fe.setSourceFile(sourceFile, 0, int64(len(testData)+100)) // Request more than available + err := fe.StreamToTempFile(context.Background()) if err == nil { t.Error("StreamToTempFile() with incomplete copy should return error") } } // TestWriteToTempFile tests WriteToTempFile method +// +//nolint:gocognit,gocyclo // table-driven write cases func TestWriteToTempFile(t *testing.T) { fe := NewFileEntry() testData := []byte("test data to write") @@ -656,12 +672,12 @@ func TestWriteToTempFile(t *testing.T) { t.Fatalf("Failed to read temp file: %v", err) } - if string(readData) != string(testData) { + if !bytes.Equal(readData, testData) { t.Errorf("WriteToTempFile() data = %q, want %q", string(readData), string(testData)) } // Cleanup - _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test // Test with cancelled context ctx, cancel := context.WithCancel(context.Background()) @@ -686,7 +702,7 @@ func TestWriteToTempFile(t *testing.T) { if fe3.TempFilePath != tempPath { t.Error("WriteToTempFile() should reuse existing temp file path") } - _ = fe3.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe3.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test // Test write error (simulate by using invalid temp file path after creation) fe4 := NewFileEntry() @@ -709,7 +725,7 @@ func TestWriteToTempFile(t *testing.T) { t.Errorf("WriteToTempFile() ProcessingState = %v, want ProcessingStateError", fe4.ProcessingState) } } - _ = fe4.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe4.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test // Test file open error (use invalid directory path) fe5 := NewFileEntry() @@ -742,7 +758,7 @@ func TestReadFromTempFile(t *testing.T) { t.Fatalf("ReadFromTempFile() error = %v", err) } - if string(readData) != string(testData) { + if !bytes.Equal(readData, testData) { t.Errorf("ReadFromTempFile() data = %q, want %q", string(readData), string(testData)) } @@ -767,7 +783,7 @@ func TestReadFromTempFile(t *testing.T) { } // Cleanup - _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test // Test with no temp file fe2 := NewFileEntry() @@ -788,7 +804,7 @@ func TestReadFromTempFile(t *testing.T) { if err == nil { t.Error("ReadFromTempFile() with cancelled context should return error") } - _ = fe3.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe3.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test // Test read beyond file size fe4 := NewFileEntry() @@ -800,7 +816,7 @@ func TestReadFromTempFile(t *testing.T) { if err == nil { t.Error("ReadFromTempFile() beyond file size should return error") } - _ = fe4.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe4.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test // Test invalid offset fe5 := NewFileEntry() @@ -812,5 +828,5 @@ func TestReadFromTempFile(t *testing.T) { if err == nil { t.Error("ReadFromTempFile() with invalid offset should return error") } - _ = fe5.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe5.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test } diff --git a/api/go/metadata/fileentry_directory.go b/api/go/metadata/fileentry_directory.go index 684e2234..235902bb 100644 --- a/api/go/metadata/fileentry_directory.go +++ b/api/go/metadata/fileentry_directory.go @@ -21,7 +21,7 @@ import ( // Returns: // - string: Parent path string (empty if root-relative or not set) // -// Specification: api_file_mgmt_file_entry.md: 1 FileEntry Structure +// Specification: api_file_mgmt_file_entry.md: 5.9 FileEntry.GetParentPath Method func (f *FileEntry) GetParentPath() string { // Get parent path by deriving it from the path string for _, pme := range f.PathMetadataEntries { @@ -49,7 +49,7 @@ func (f *FileEntry) GetParentPath() string { // Returns: // - int: Path depth (0 for root-relative files) // -// Specification: api_metadata.md: 8.1.8 Path Management Methods +// Specification: api_file_mgmt_file_entry.md: 5.10 FileEntry.GetDirectoryDepth Method func (f *FileEntry) GetDirectoryDepth() int { // Get depth from the first associated PathMetadataEntry for _, pme := range f.PathMetadataEntries { @@ -69,7 +69,7 @@ func (f *FileEntry) GetDirectoryDepth() int { // Returns: // - bool: True if file is root-relative, false otherwise // -// Specification: api_file_mgmt_file_entry.md: 1 FileEntry Structure +// Specification: api_file_mgmt_file_entry.md: 5.11 FileEntry.IsRootRelative Method func (f *FileEntry) IsRootRelative() bool { // Check if any associated PathMetadataEntry has a parent path for _, pme := range f.PathMetadataEntries { diff --git a/api/go/metadata/fileentry_directory_test.go b/api/go/metadata/fileentry_directory_test.go index e044a23e..9a3181ee 100644 --- a/api/go/metadata/fileentry_directory_test.go +++ b/api/go/metadata/fileentry_directory_test.go @@ -6,119 +6,51 @@ import ( "github.com/novus-engine/novuspack/api/go/generics" ) -// TestGetParentPath tests GetParentPath method -func TestGetParentPath(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - want string - }{ - { - name: "no path metadata entries", - setup: func() *FileEntry { - return NewFileEntry() - }, - want: "", - }, - { - name: "with path metadata entry", - setup: func() *FileEntry { - fe := NewFileEntry() - fe.Paths = []generics.PathEntry{ - {Path: "/test/file.txt", PathLength: 14}, - } - pme := &PathMetadataEntry{ - Path: generics.PathEntry{Path: "/test/file.txt", PathLength: 14}, - } - if fe.PathMetadataEntries == nil { - fe.PathMetadataEntries = make(map[string]*PathMetadataEntry) - } - fe.PathMetadataEntries["/test/file.txt"] = pme - return fe - }, - want: "/test", - }, - } +type fileEntryTableCase struct { + name string + setup func() *FileEntry + want interface{} +} +func runFileEntryTableTest(t *testing.T, tests []fileEntryTableCase, getter func(*FileEntry) interface{}, format string) { + t.Helper() for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { fe := tt.setup() - got := fe.GetParentPath() - + got := getter(fe) if got != tt.want { - t.Errorf("GetParentPath() = %q, want %q", got, tt.want) + t.Errorf(format, got, tt.want) } }) } } -// TestGetDirectoryDepth tests GetDirectoryDepth method -func TestGetDirectoryDepth(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - want int - }{ - { - name: "no path metadata entries", - setup: func() *FileEntry { - return NewFileEntry() - }, - want: 0, - }, - { - name: "one level deep", - setup: func() *FileEntry { - fe := NewFileEntry() - fe.Paths = []generics.PathEntry{ - {Path: "/test/file.txt", PathLength: 14}, - } - pme := &PathMetadataEntry{ - Path: generics.PathEntry{Path: "/test/file.txt", PathLength: 14}, - } - // Set depth to 1 - parent := &PathMetadataEntry{ - Path: generics.PathEntry{Path: "/test", PathLength: 5}, - } - pme.ParentPath = parent - if fe.PathMetadataEntries == nil { - fe.PathMetadataEntries = make(map[string]*PathMetadataEntry) - } - fe.PathMetadataEntries["/test/file.txt"] = pme - return fe - }, - want: 1, - }, +// fileEntryWithPathAndParent returns a FileEntry with one path and PathMetadataEntry with given parent path (depth 1). +func fileEntryWithPathAndParent(filePath, parentPath string) *FileEntry { + fe := NewFileEntry() + fe.Paths = []generics.PathEntry{ + {Path: filePath, PathLength: uint16(len(filePath))}, } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - got := fe.GetDirectoryDepth() - - if got != tt.want { - t.Errorf("GetDirectoryDepth() = %d, want %d", got, tt.want) - } - }) + pme := &PathMetadataEntry{ + Path: generics.PathEntry{Path: filePath, PathLength: uint16(len(filePath))}, } + parent := &PathMetadataEntry{ + Path: generics.PathEntry{Path: parentPath, PathLength: uint16(len(parentPath))}, + } + pme.ParentPath = parent + if fe.PathMetadataEntries == nil { + fe.PathMetadataEntries = make(map[string]*PathMetadataEntry) + } + fe.PathMetadataEntries[filePath] = pme + return fe } -// TestIsRootRelative tests IsRootRelative method -func TestIsRootRelative(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - want bool - }{ - { - name: "root relative", - setup: func() *FileEntry { - return NewFileEntry() - }, - want: true, - }, +// TestGetParentPath tests GetParentPath method +func TestGetParentPath(t *testing.T) { + tests := []fileEntryTableCase{ + {name: "no path metadata entries", setup: NewFileEntry, want: ""}, { - name: "not root relative", + name: "with path metadata entry", setup: func() *FileEntry { fe := NewFileEntry() fe.Paths = []generics.PathEntry{ @@ -127,28 +59,32 @@ func TestIsRootRelative(t *testing.T) { pme := &PathMetadataEntry{ Path: generics.PathEntry{Path: "/test/file.txt", PathLength: 14}, } - parent := &PathMetadataEntry{ - Path: generics.PathEntry{Path: "/test", PathLength: 5}, - } - pme.ParentPath = parent if fe.PathMetadataEntries == nil { fe.PathMetadataEntries = make(map[string]*PathMetadataEntry) } fe.PathMetadataEntries["/test/file.txt"] = pme return fe }, - want: false, + want: "/test", }, } + runFileEntryTableTest(t, tests, func(fe *FileEntry) interface{} { return fe.GetParentPath() }, "GetParentPath() = %q, want %q") +} - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - got := fe.IsRootRelative() +// TestGetDirectoryDepth tests GetDirectoryDepth method +func TestGetDirectoryDepth(t *testing.T) { + tests := []fileEntryTableCase{ + {name: "no path metadata entries", setup: NewFileEntry, want: 0}, + {name: "one level deep", setup: func() *FileEntry { return fileEntryWithPathAndParent("/test/file.txt", "/test") }, want: 1}, + } + runFileEntryTableTest(t, tests, func(fe *FileEntry) interface{} { return fe.GetDirectoryDepth() }, "GetDirectoryDepth() = %d, want %d") +} - if got != tt.want { - t.Errorf("IsRootRelative() = %v, want %v", got, tt.want) - } - }) +// TestIsRootRelative tests IsRootRelative method +func TestIsRootRelative(t *testing.T) { + tests := []fileEntryTableCase{ + {name: "root relative", setup: NewFileEntry, want: true}, + {name: "not root relative", setup: func() *FileEntry { return fileEntryWithPathAndParent("/test/file.txt", "/test") }, want: false}, } + runFileEntryTableTest(t, tests, func(fe *FileEntry) interface{} { return fe.IsRootRelative() }, "IsRootRelative() = %v, want %v") } diff --git a/api/go/metadata/fileentry_marshal.go b/api/go/metadata/fileentry_marshal.go index d971ba87..7045f324 100644 --- a/api/go/metadata/fileentry_marshal.go +++ b/api/go/metadata/fileentry_marshal.go @@ -18,6 +18,22 @@ import ( "github.com/novus-engine/novuspack/api/go/pkgerrors" ) +// writeSliceToWriter writes slice entries to w using writeAt(i) for each index; updates totalWritten. +func writeSliceToWriter(w io.Writer, totalWritten int64, n int, fieldName string, writeAt func(i int) (int64, error)) (int64, error) { + for i := 0; i < n; i++ { + written, err := writeAt(i) + if err != nil { + return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to write %s entry %d", fieldName, i), pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: i, + Expected: "written successfully", + }) + } + totalWritten += written + } + return totalWritten, nil +} + // MarshalMeta marshals the FileEntry metadata (header + variable data) to bytes. // // Marshals the complete FileEntry metadata structure including: @@ -60,7 +76,7 @@ func (f *FileEntry) MarshalMeta() ([]byte, error) { // // Specification: api_file_mgmt_file_entry.md: 6. Marshaling func (f *FileEntry) MarshalData() ([]byte, error) { - if !f.IsDataLoaded && len(f.Data) == 0 { + if !f.IsDataLoaded { return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "file entry data not available", nil, pkgerrors.ValidationErrorContext{ Field: "Data", Value: nil, @@ -86,7 +102,7 @@ func (f *FileEntry) MarshalData() ([]byte, error) { // - ErrTypeIO: I/O error during marshaling // // Specification: api_file_mgmt_file_entry.md: 6. Marshaling -func (f *FileEntry) Marshal() (metadata []byte, data []byte, err error) { +func (f *FileEntry) Marshal() (metadata, data []byte, err error) { metadata, err = f.MarshalMeta() if err != nil { return nil, nil, err @@ -133,40 +149,19 @@ func (f *FileEntry) WriteMetaTo(w io.Writer) (int64, error) { hashSize := 0 for _, h := range f.Hashes { - hashSize += h.Size() + hashSize += h.size() } f.HashDataOffset = uint32(pathsSize) f.HashDataLen = uint16(hashSize) optionalDataSize := 0 for _, o := range f.OptionalData { - optionalDataSize += o.Size() + optionalDataSize += o.size() } f.OptionalDataOffset = uint32(pathsSize + hashSize) f.OptionalDataLen = uint16(optionalDataSize) // Write fixed section (64 bytes) - type FileEntryFixed struct { - FileID uint64 - OriginalSize uint64 - StoredSize uint64 - RawChecksum uint32 - StoredChecksum uint32 - FileVersion uint32 - MetadataVersion uint32 - PathCount uint16 - Type uint16 - CompressionType uint8 - CompressionLevel uint8 - EncryptionType uint8 - HashCount uint8 - HashDataOffset uint32 - HashDataLen uint16 - OptionalDataLen uint16 - OptionalDataOffset uint32 - Reserved uint32 - } - fixed := FileEntryFixed{ FileID: f.FileID, OriginalSize: f.OriginalSize, @@ -198,42 +193,22 @@ func (f *FileEntry) WriteMetaTo(w io.Writer) (int64, error) { totalWritten += FileEntryFixedSize // Write path entries (starting at offset 0) - for i, path := range f.Paths { - n, err := path.WriteTo(w) - if err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to write path entry %d", i), pkgerrors.ValidationErrorContext{ - Field: "Paths", - Value: i, - Expected: "written successfully", - }) - } - totalWritten += n + var err error + totalWritten, err = writeSliceToWriter(w, totalWritten, len(f.Paths), "Paths", func(i int) (int64, error) { return f.Paths[i].WriteTo(w) }) + if err != nil { + return totalWritten, err } // Write hash entries (starting at HashDataOffset) - for i, hash := range f.Hashes { - n, err := hash.WriteTo(w) - if err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to write hash entry %d", i), pkgerrors.ValidationErrorContext{ - Field: "Hashes", - Value: i, - Expected: "written successfully", - }) - } - totalWritten += n + totalWritten, err = writeSliceToWriter(w, totalWritten, len(f.Hashes), "Hashes", func(i int) (int64, error) { return f.Hashes[i].writeTo(w) }) + if err != nil { + return totalWritten, err } // Write optional data entries (starting at OptionalDataOffset) - for i, opt := range f.OptionalData { - n, err := opt.WriteTo(w) - if err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to write optional data entry %d", i), pkgerrors.ValidationErrorContext{ - Field: "OptionalData", - Value: i, - Expected: "written successfully", - }) - } - totalWritten += n + totalWritten, err = writeSliceToWriter(w, totalWritten, len(f.OptionalData), "OptionalData", func(i int) (int64, error) { return f.OptionalData[i].writeTo(w) }) + if err != nil { + return totalWritten, err } return totalWritten, nil @@ -257,9 +232,11 @@ func (f *FileEntry) WriteMetaTo(w io.Writer) (int64, error) { // Follows Go's standard io.WriterTo pattern. // // Specification: api_file_mgmt_file_entry.md: 1. FileEntry Structure +// +//nolint:gocognit // branch count from data source and validation paths func (f *FileEntry) WriteDataTo(w io.Writer) (int64, error) { // If data is in memory, write it directly - if f.IsDataLoaded && len(f.Data) > 0 { + if f.IsDataLoaded { n, err := w.Write(f.Data) if err != nil { return int64(n), pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write data", pkgerrors.ValidationErrorContext{ diff --git a/api/go/metadata/fileentry_marshal_test.go b/api/go/metadata/fileentry_marshal_test.go index 635e8b3a..8427d9e2 100644 --- a/api/go/metadata/fileentry_marshal_test.go +++ b/api/go/metadata/fileentry_marshal_test.go @@ -11,6 +11,17 @@ import ( "github.com/novus-engine/novuspack/api/go/internal/testhelpers" ) +func runWriteMetaToFailingWriterTest(t *testing.T, errMsg string) { + t.Helper() + fe := NewFileEntry() + fe.FileID = 1 + failingWriter := testhelpers.NewErrorWriter() + _, err := fe.WriteMetaTo(failingWriter) + if err == nil { + t.Error(errMsg) + } +} + // TestMarshalMeta tests MarshalMeta method func TestMarshalMeta(t *testing.T) { tests := []struct { @@ -74,16 +85,8 @@ func TestMarshalMeta(t *testing.T) { }) } - // Test error path - failing writer t.Run("failing writer", func(t *testing.T) { - fe := NewFileEntry() - fe.FileID = 1 - // MarshalMeta uses WriteMetaTo internally, so test that path - failingWriter := testhelpers.NewErrorWriter() - _, err := fe.WriteMetaTo(failingWriter) - if err == nil { - t.Error("MarshalMeta() with failing writer should return error") - } + runWriteMetaToFailingWriterTest(t, "MarshalMeta() with failing writer should return error") }) } @@ -104,10 +107,8 @@ func TestMarshalData(t *testing.T) { wantErr: false, }, { - name: "no data available", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "no data available", + setup: NewFileEntry, wantErr: true, }, } @@ -240,19 +241,14 @@ func TestWriteMetaTo(t *testing.T) { }) } - // Test error path - failing writer t.Run("failing writer", func(t *testing.T) { - fe := NewFileEntry() - fe.FileID = 1 - failingWriter := testhelpers.NewErrorWriter() - _, err := fe.WriteMetaTo(failingWriter) - if err == nil { - t.Error("WriteMetaTo() with failing writer should return error") - } + runWriteMetaToFailingWriterTest(t, "WriteMetaTo() with failing writer should return error") }) } // TestWriteDataTo tests WriteDataTo method +// +//nolint:gocognit // table-driven write cases func TestWriteDataTo(t *testing.T) { // Create a temporary source file sourceFile, err := os.CreateTemp("", "novuspack-source-*") @@ -284,17 +280,15 @@ func TestWriteDataTo(t *testing.T) { wantErr: false, }, { - name: "no data available", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "no data available", + setup: NewFileEntry, wantErr: true, }, { name: "data from source file", setup: func() *FileEntry { fe := NewFileEntry() - fe.SetSourceFile(sourceFile, 0, int64(len(testData))) + fe.setSourceFile(sourceFile, 0, int64(len(testData))) return fe }, wantErr: false, @@ -336,7 +330,7 @@ func TestWriteDataTo(t *testing.T) { // Cleanup temp file if created if tt.name == "data from temp file" { - _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck + _ = fe.CleanupTempFile(context.Background()) //nolint:errcheck // cleanup best-effort in test } }) } @@ -347,7 +341,7 @@ func TestWriteDataTo(t *testing.T) { // Use a closed file to cause seek error closedFile, _ := os.Open(os.DevNull) _ = closedFile.Close() - fe.SetSourceFile(closedFile, 0, 10) + fe.setSourceFile(closedFile, 0, 10) var buf bytes.Buffer _, err := fe.WriteDataTo(&buf) if err == nil { @@ -367,7 +361,7 @@ func TestWriteDataTo(t *testing.T) { _ = os.Remove(sourceFile.Name()) }() // Request more data than available - fe.SetSourceFile(sourceFile, 0, 100) + fe.setSourceFile(sourceFile, 0, 100) var buf bytes.Buffer _, err = fe.WriteDataTo(&buf) if err == nil { @@ -379,7 +373,7 @@ func TestWriteDataTo(t *testing.T) { t.Run("temp file open error", func(t *testing.T) { fe := NewFileEntry() // Set invalid temp file path - fe.SetTempPath("/invalid/path/that/does/not/exist") + fe.setTempPath("/invalid/path/that/does/not/exist") var buf bytes.Buffer _, err := fe.WriteDataTo(&buf) // This might succeed if it falls back to other data sources, or fail diff --git a/api/go/metadata/fileentry_path_test.go b/api/go/metadata/fileentry_path_test.go index 6cfced99..843141d2 100644 --- a/api/go/metadata/fileentry_path_test.go +++ b/api/go/metadata/fileentry_path_test.go @@ -6,13 +6,19 @@ import ( "github.com/novus-engine/novuspack/api/go/generics" ) +// fileEntryWithTwoPathsFirstSymlink returns a FileEntry with two paths; the first is a symlink with the given target. +func fileEntryWithTwoPathsFirstSymlink(path1, linkTarget, path2 string) *FileEntry { + fe := NewFileEntry() + fe.Paths = []generics.PathEntry{ + {Path: path1, IsSymlink: true, LinkTarget: linkTarget}, + {Path: path2}, + } + return fe +} + // TestHasSymlinks tests HasSymlinks method func TestHasSymlinks(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - want bool - }{ + tests := []fileEntryTableCase{ { name: "no symlinks", setup: func() *FileEntry { @@ -37,25 +43,9 @@ func TestHasSymlinks(t *testing.T) { }, want: true, }, - { - name: "no paths", - setup: func() *FileEntry { - return NewFileEntry() - }, - want: false, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - got := fe.HasSymlinks() - - if got != tt.want { - t.Errorf("HasSymlinks() = %v, want %v", got, tt.want) - } - }) + {name: "no paths", setup: NewFileEntry, want: false}, } + runFileEntryTableTest(t, tests, func(fe *FileEntry) interface{} { return fe.HasSymlinks() }, "HasSymlinks() = %v, want %v") } // TestGetSymlinkPaths tests GetSymlinkPaths method @@ -84,11 +74,7 @@ func TestGetSymlinkPaths(t *testing.T) { // TestGetPrimaryPath tests GetPrimaryPath method func TestGetPrimaryPath(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - want string - }{ + tests := []fileEntryTableCase{ { name: "has paths", setup: func() *FileEntry { @@ -99,27 +85,11 @@ func TestGetPrimaryPath(t *testing.T) { } return fe }, - want: "test/file1", // display format (no leading slash) - }, - { - name: "no paths", - setup: func() *FileEntry { - return NewFileEntry() - }, - want: "", + want: "test/file1", }, + {name: "no paths", setup: NewFileEntry, want: ""}, } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - got := fe.GetPrimaryPath() - - if got != tt.want { - t.Errorf("GetPrimaryPath() = %q, want %q", got, tt.want) - } - }) - } + runFileEntryTableTest(t, tests, func(fe *FileEntry) interface{} { return fe.GetPrimaryPath() }, "GetPrimaryPath() = %q, want %q") } // TestResolveAllSymlinks tests ResolveAllSymlinks method @@ -132,24 +102,14 @@ func TestResolveAllSymlinks(t *testing.T) { { name: "absolute symlink target", setup: func() *FileEntry { - fe := NewFileEntry() - fe.Paths = []generics.PathEntry{ - {Path: "/test/file1", IsSymlink: true, LinkTarget: "/absolute/target"}, - {Path: "/test/file2"}, - } - return fe + return fileEntryWithTwoPathsFirstSymlink("/test/file1", "/absolute/target", "/test/file2") }, want: []string{"/absolute/target", "/test/file2"}, }, { name: "relative symlink target", setup: func() *FileEntry { - fe := NewFileEntry() - fe.Paths = []generics.PathEntry{ - {Path: "/test/file1", IsSymlink: true, LinkTarget: "relative/target"}, - {Path: "/test/file2"}, - } - return fe + return fileEntryWithTwoPathsFirstSymlink("/test/file1", "relative/target", "/test/file2") }, want: []string{"/test/relative/target", "/test/file2"}, }, diff --git a/api/go/metadata/fileentry_tags.go b/api/go/metadata/fileentry_tags.go index 1da97d7b..f9e70ed9 100644 --- a/api/go/metadata/fileentry_tags.go +++ b/api/go/metadata/fileentry_tags.go @@ -21,7 +21,7 @@ import ( func (f *FileEntry) updateOptionalDataLen() { f.OptionalDataLen = 0 for _, opt := range f.OptionalData { - f.OptionalDataLen += uint16(opt.Size()) + f.OptionalDataLen += uint16(opt.size()) } } @@ -33,104 +33,78 @@ func (f *FileEntry) removeOptionalDataEntry(index int) { f.updateOptionalDataLen() } +// parseTagFromRaw unmarshals one tag from raw JSON; returns nil tag and error if invalid or unknown ValueType. +func parseTagFromRaw(rawTag json.RawMessage) (*generics.Tag[any], error) { + var tagData struct { + Key string + ValueType generics.TagValueType + Value any + } + if err := json.Unmarshal(rawTag, &tagData); err != nil { + return nil, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to parse individual tag from optional data", pkgerrors.ValidationErrorContext{ + Field: "tag", Value: string(rawTag), Expected: "valid tag JSON object", + }) + } + if tagData.ValueType > generics.TagValueTypeNovusPackMetadata { + return nil, pkgerrors.NewTypedPackageError(pkgerrors.ErrTypeCorruption, "invalid tag value type", nil, pkgerrors.ValidationErrorContext{ + Field: "ValueType", Value: tagData.ValueType, Expected: "valid TagValueType constant (0x00-0x10)", + }) + } + return generics.NewTag(tagData.Key, tagData.Value, tagData.ValueType), nil +} + // getTagsFromOptionalData extracts tags from OptionalData entries. // // Returns a map of tag keys to Tag[any] pointers and an error if corruption is encountered. // If individual tags are corrupted, they are skipped while preserving valid tags. // If the entire OptionalData entry is corrupted beyond recovery, it is removed. -// The error indicates that corruption was encountered, allowing calling code to handle it appropriately. +// +//nolint:gocognit // loop + corruption handling branches func (f *FileEntry) getTagsFromOptionalData() (map[string]*generics.Tag[any], error) { tags := make(map[string]*generics.Tag[any]) var corruptionErr error - - // Find the tags OptionalDataEntry (DataType 0x00) for i, opt := range f.OptionalData { - if opt.DataType == OptionalDataTagsData { - // First, try to unmarshal as an array of raw JSON messages - // This allows us to parse individual tags even if some are corrupted - var rawTags []json.RawMessage - if err := json.Unmarshal(opt.Data, &rawTags); err != nil { - // If we can't even parse the array structure, remove the entire entry - corruptionErr = pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to parse tags array from optional data", pkgerrors.ValidationErrorContext{ - Field: "OptionalData", - Value: opt.Data, - Expected: "valid JSON tag array", - }) - f.removeOptionalDataEntry(i) - return tags, corruptionErr - } - - // Parse each tag individually, skipping corrupted ones - corruptedCount := 0 - for _, rawTag := range rawTags { - var tagData struct { - Key string - ValueType generics.TagValueType - Value any - } - if err := json.Unmarshal(rawTag, &tagData); err != nil { - // Track corruption but continue parsing other tags - corruptedCount++ - if corruptionErr == nil { - corruptionErr = pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to parse individual tag from optional data", pkgerrors.ValidationErrorContext{ - Field: "tag", - Value: string(rawTag), - Expected: "valid tag JSON object", - }) - } - continue - } - - // Validate that ValueType is a valid TagValueType constant (0x00-0x10) - // Note: TagValueTypeString = 0x00 is valid, but we need to ensure it's explicitly set - if tagData.ValueType > generics.TagValueTypeNovusPackMetadata { - // Invalid tag type - skip this tag - corruptedCount++ - if corruptionErr == nil { - corruptionErr = pkgerrors.NewTypedPackageError(pkgerrors.ErrTypeCorruption, "invalid tag value type", nil, pkgerrors.ValidationErrorContext{ - Field: "ValueType", - Value: tagData.ValueType, - Expected: "valid TagValueType constant (0x00-0x10)", - }) - } - continue + if opt.DataType != OptionalDataTagsData { + continue + } + var rawTags []json.RawMessage + if err := json.Unmarshal(opt.Data, &rawTags); err != nil { + corruptionErr = pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to parse tags array from optional data", pkgerrors.ValidationErrorContext{ + Field: "OptionalData", Value: opt.Data, Expected: "valid JSON tag array", + }) + f.removeOptionalDataEntry(i) + return tags, corruptionErr + } + corruptedCount := 0 + for _, rawTag := range rawTags { + tag, err := parseTagFromRaw(rawTag) + if err != nil { + corruptedCount++ + if corruptionErr == nil { + corruptionErr = err } - - // Create Tag[any] with appropriate type conversion - tag := generics.NewTag(tagData.Key, tagData.Value, tagData.ValueType) - tags[tagData.Key] = tag - } - - // If we encountered corruption, update the error message with count - if corruptedCount > 0 && corruptionErr != nil { - errType, _ := pkgerrors.GetErrorType(corruptionErr) - corruptionErr = pkgerrors.WrapErrorWithContext(corruptionErr, errType, - "encountered corrupted tags during parsing", pkgerrors.ValidationErrorContext{ - Field: "corrupted_tags", - Value: corruptedCount, - Expected: "all tags valid", - }) + continue } - - // If we successfully parsed at least some tags, update the OptionalData - // to reflect only the valid tags (this repairs the data) - if len(tags) > 0 { - // Re-sync to remove any corrupted tags from storage - if err := f.syncTagsToOptionalData(tags); err != nil { - // If sync fails, return what we have with both errors - if corruptionErr != nil { - return tags, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to repair corrupted tags", corruptionErr) - } - return tags, err + tags[tag.Key] = tag + } + if corruptedCount > 0 && corruptionErr != nil { + errType, _ := pkgerrors.GetErrorType(corruptionErr) + corruptionErr = pkgerrors.WrapErrorWithContext(corruptionErr, errType, "encountered corrupted tags during parsing", pkgerrors.ValidationErrorContext{ + Field: "corrupted_tags", Value: corruptedCount, Expected: "all tags valid", + }) + } + if len(tags) > 0 { + if err := f.syncTagsToOptionalData(tags); err != nil { + if corruptionErr != nil { + return tags, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to repair corrupted tags", corruptionErr) } - } else { - // No valid tags found, remove the entry - f.removeOptionalDataEntry(i) + return tags, err } - break + } else { + f.removeOptionalDataEntry(i) } + break } - return tags, corruptionErr } @@ -246,17 +220,7 @@ func GetFileEntryTagsByType[T any](fe *FileEntry) ([]*generics.Tag[T], error) { if err != nil { return nil, err } - - result := make([]*generics.Tag[T], 0) - for i := range allTags { - // Type assert the value to ensure it's of type T - // If type assertion succeeds, the tag's Type field should match the expected TagValueType for T - if typedValue, ok := allTags[i].Value.(T); ok { - result = append(result, generics.NewTag(allTags[i].Key, typedValue, allTags[i].Type)) - } - } - - return result, nil + return filterTagsByType[T](allTags), nil } // GetFileEntryTag retrieves a type-safe tag by key from a FileEntry. @@ -588,6 +552,8 @@ func SyncFileEntryTags(fe *FileEntry) error { // // Note: This is a standalone function rather than a method due to Go's limitation // of not supporting generic methods on non-generic types. See api_generics.md for details. +// +//nolint:gocognit // inheritance and merge logic branches func GetFileEntryEffectiveTags(fe *FileEntry) ([]*generics.Tag[any], error) { // Start with file-level tags fileTags, err := GetFileEntryTags(fe) diff --git a/api/go/metadata/fileentry_tags_test.go b/api/go/metadata/fileentry_tags_test.go index 0f66335d..495357ff 100644 --- a/api/go/metadata/fileentry_tags_test.go +++ b/api/go/metadata/fileentry_tags_test.go @@ -8,19 +8,153 @@ import ( "github.com/novus-engine/novuspack/api/go/pkgerrors" ) +// fileEntryWithOptionalTagsRaw returns a FileEntry with OptionalDataTagsData set to data. +func fileEntryWithOptionalTagsRaw(data []byte) *FileEntry { + fe := NewFileEntry() + fe.OptionalData = []OptionalDataEntry{ + {DataType: OptionalDataTagsData, DataLength: uint16(len(data)), Data: data}, + } + fe.updateOptionalDataLen() + return fe +} + +// fileEntryWithEmptyTagsArray returns a FileEntry with empty JSON array as tags data. +func fileEntryWithEmptyTagsArray() *FileEntry { + return fileEntryWithOptionalTagsRaw([]byte("[]")) +} + +// fileEntryWithCorruptedTagsData returns a FileEntry with invalid JSON as tags data (DataLength 10). +func fileEntryWithCorruptedTagsData() *FileEntry { + fe := NewFileEntry() + fe.OptionalData = []OptionalDataEntry{ + {DataType: OptionalDataTagsData, DataLength: 10, Data: []byte("invalid json")}, + } + fe.updateOptionalDataLen() + return fe +} + +// fileEntryWithPartiallyCorruptedTags returns a FileEntry with one valid tag and appended invalid JSON. +func fileEntryWithPartiallyCorruptedTags() *FileEntry { + validTag := generics.NewTag[any]("valid", "value", generics.TagValueTypeString) + tags := []*generics.Tag[any]{validTag} + tagData, _ := json.Marshal(tags) + corruptedData := append(append([]byte(nil), tagData...), []byte(`,"invalid":}`)...) + return fileEntryWithOptionalTagsRaw(corruptedData) +} + +// fileEntryWithValidTagPlusCorruptedSuffix builds a FileEntry with one valid tag and a corrupted suffix in the tags array. +func fileEntryWithValidTagPlusCorruptedSuffix(suffix []byte) *FileEntry { + fe := NewFileEntry() + validTag := generics.NewTag[any]("valid", "value", generics.TagValueTypeString) + tags := []*generics.Tag[any]{validTag} + tagData, _ := json.Marshal(tags) + arr := []byte(`[`) + arr = append(arr, tagData[1:len(tagData)-1]...) + arr = append(arr, suffix...) + arr = append(arr, []byte(`]`)...) + fe.OptionalData = []OptionalDataEntry{ + {DataType: OptionalDataTagsData, DataLength: uint16(len(arr)), Data: arr}, + } + fe.updateOptionalDataLen() + return fe +} + +func fileEntryWithCorruptedIndividualTag() *FileEntry { + return fileEntryWithValidTagPlusCorruptedSuffix([]byte(`,{"invalid":}`)) +} + +func fileEntryWithPartialCorruptionInvalidArray() *FileEntry { + return fileEntryWithValidTagPlusCorruptedSuffix([]byte(`,{"invalid":}]`)) +} + +func fileEntryWithCorruptedArrayValidStructure() *FileEntry { + fe := NewFileEntry() + corruptedArray := []byte(`[{"Key":"tag1","ValueType":0,"Value":null},{"Key":"tag2","ValueType":0,"Value":null}]`) + fe.OptionalData = []OptionalDataEntry{ + {DataType: OptionalDataTagsData, DataLength: uint16(len(corruptedArray)), Data: corruptedArray}, + } + fe.updateOptionalDataLen() + return fe +} + +func fileEntryWithAllCorruptedTags() *FileEntry { + fe := NewFileEntry() + allCorrupted := []byte(`[{"invalid":},{"bad":}]`) + fe.OptionalData = []OptionalDataEntry{ + {DataType: OptionalDataTagsData, DataLength: uint16(len(allCorrupted)), Data: allCorrupted}, + } + fe.updateOptionalDataLen() + return fe +} + +func fileEntryWithInvalidValueTypeTags() *FileEntry { + fe := NewFileEntry() + invalidValueType := []byte(`[{"Key":"tag1","ValueType":17,"Value":"value1"},{"Key":"tag2","ValueType":255,"Value":"value2"}]`) + fe.OptionalData = []OptionalDataEntry{ + {DataType: OptionalDataTagsData, DataLength: uint16(len(invalidValueType)), Data: invalidValueType}, + } + fe.updateOptionalDataLen() + return fe +} + +// tagErrorCase is a single case for tag error-handling table tests. +type tagErrorCase struct { + name string + setup func() *FileEntry + run func(*FileEntry) error + wantErr bool +} + +// runTagErrorTable runs a table of tag error-handling cases. +func runTagErrorTable(t *testing.T, cases []tagErrorCase) { + t.Helper() + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + fe := tc.setup() + err := tc.run(fe) + if (err != nil) != tc.wantErr { + t.Errorf("tag op error = %v, wantErr %v", err, tc.wantErr) + } + }) + } +} + +type getFileEntryTagsCase struct { + name string + setup func() *FileEntry + wantCount int + wantErr bool +} + +func runGetFileEntryTagsTable(t *testing.T, tests []getFileEntryTagsCase) { + t.Helper() + runTagsCountTable(t, tests, GetFileEntryTags, "GetFileEntryTags") +} + +// runTagsCountTable runs a table of tag-count tests for any getTags func that returns ([]*Tag[any], error). +func runTagsCountTable(t *testing.T, tests []getFileEntryTagsCase, getTags func(*FileEntry) ([]*generics.Tag[any], error), methodName string) { + t.Helper() + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + fe := tt.setup() + tags, err := getTags(fe) + if (err != nil) != tt.wantErr { + t.Errorf("%s() error = %v, wantErr %v", methodName, err, tt.wantErr) + return + } + if !tt.wantErr && len(tags) != tt.wantCount { + t.Errorf("%s() returned %d tags, want %d", methodName, len(tags), tt.wantCount) + } + }) + } +} + // TestGetFileEntryTags tests GetFileEntryTags function func TestGetFileEntryTags(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - wantCount int - wantErr bool - }{ + tests := []getFileEntryTagsCase{ { - name: "empty tags", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "empty tags", + setup: NewFileEntry, wantCount: 0, wantErr: false, }, @@ -68,231 +202,123 @@ func TestGetFileEntryTags(t *testing.T) { wantErr: false, }, { - name: "corrupted tags data", - setup: func() *FileEntry { - fe := NewFileEntry() - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, + name: "corrupted tags data", + setup: fileEntryWithCorruptedTagsData, wantCount: 0, wantErr: true, }, { - name: "partially corrupted tags", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create tags with one valid and one corrupted tag - validTag := generics.NewTag[any]("valid", "value", generics.TagValueTypeString) - tags := []*generics.Tag[any]{validTag} - tagData, _ := json.Marshal(tags) - // Append invalid JSON to create partial corruption - corruptedData := append(tagData, []byte(`,"invalid":}`)...) - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(corruptedData)), - Data: corruptedData, - }, - } - fe.updateOptionalDataLen() - return fe - }, + name: "partially corrupted tags", + setup: fileEntryWithPartiallyCorruptedTags, wantCount: 1, // Should recover valid tag wantErr: true, // But report corruption error }, { - name: "empty tags array", - setup: func() *FileEntry { - fe := NewFileEntry() - tagData := []byte("[]") - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(tagData)), - Data: tagData, - }, - } - fe.updateOptionalDataLen() - return fe - }, + name: "empty tags array", + setup: fileEntryWithEmptyTagsArray, wantCount: 0, wantErr: false, }, { - name: "array with corrupted individual tag", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create array with one valid tag and one corrupted tag - validTag := generics.NewTag[any]("valid", "value", generics.TagValueTypeString) - tags := []*generics.Tag[any]{validTag} - tagData, _ := json.Marshal(tags) - // Create array with valid tag followed by corrupted tag - corruptedArray := []byte(`[`) - corruptedArray = append(corruptedArray, tagData[1:len(tagData)-1]...) // Remove outer brackets - corruptedArray = append(corruptedArray, []byte(`,{"invalid":}`)...) - corruptedArray = append(corruptedArray, []byte(`]`)...) - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(corruptedArray)), - Data: corruptedArray, - }, - } - fe.updateOptionalDataLen() - return fe - }, - wantCount: 1, // Should recover valid tag - wantErr: true, // But report corruption error + name: "array with corrupted individual tag", + setup: fileEntryWithCorruptedIndividualTag, + wantCount: 1, + wantErr: true, }, { - name: "partially corrupted tags", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create tags with one valid and one corrupted tag - validTag := generics.NewTag[any]("valid", "value", generics.TagValueTypeString) - tags := []*generics.Tag[any]{validTag} - tagData, _ := json.Marshal(tags) - // Append invalid JSON to create partial corruption - corruptedData := append(tagData, []byte(`,"invalid":}`)...) - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(corruptedData)), - Data: corruptedData, - }, - } - fe.updateOptionalDataLen() - return fe - }, + name: "partially corrupted tags", + setup: fileEntryWithPartiallyCorruptedTags, wantCount: 1, // Should recover valid tag wantErr: true, // But report corruption error }, { - name: "empty tags array", - setup: func() *FileEntry { - fe := NewFileEntry() - tagData := []byte("[]") - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(tagData)), - Data: tagData, - }, - } - fe.updateOptionalDataLen() - return fe - }, + name: "empty tags array", + setup: fileEntryWithEmptyTagsArray, wantCount: 0, wantErr: false, }, } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - tags, err := GetFileEntryTags(fe) - - if (err != nil) != tt.wantErr { - t.Errorf("GetFileEntryTags() error = %v, wantErr %v", err, tt.wantErr) - return - } - - if !tt.wantErr { - if len(tags) != tt.wantCount { - t.Errorf("GetFileEntryTags() returned %d tags, want %d", len(tags), tt.wantCount) - } - } - }) - } + runGetFileEntryTagsTable(t, tests) } -// TestGetFileEntryTagsByType tests GetFileEntryTagsByType function -func TestGetFileEntryTagsByType(t *testing.T) { +func fileEntryWithTagsForGetByType(t *testing.T) *FileEntry { + t.Helper() fe := NewFileEntry() - // Add tags using AddFileEntryTag to ensure proper serialization _ = AddFileEntryTag(fe, "str1", "value1", generics.TagValueTypeString) _ = AddFileEntryTag(fe, "str2", "value2", generics.TagValueTypeString) _ = AddFileEntryTag(fe, "int1", int64(42), generics.TagValueTypeInteger) _ = AddFileEntryTag(fe, "int2", int64(100), generics.TagValueTypeInteger) _ = AddFileEntryTag(fe, "bool1", true, generics.TagValueTypeBoolean) _ = AddFileEntryTag(fe, "float1", 3.14, generics.TagValueTypeFloat) + return fe +} - tests := []struct { - name string - wantCount int - wantErr bool - }{ - {"string tags", 2, false}, - {"integer tags", 2, false}, - {"boolean tags", 1, false}, - {"float tags", 1, false}, +func runGetFileEntryTagsByTypeCount(t *testing.T, fe *FileEntry, tagType generics.TagValueType, wantCount int) { + t.Helper() + allTags, err := GetFileEntryTags(fe) + if err != nil { + t.Errorf("GetFileEntryTags() error = %v", err) + return + } + var count int + for _, tag := range allTags { + if tag.Type == tagType { + count++ + } + } + if count != wantCount { + t.Errorf("tag type %v count = %d, want %d", tagType, count, wantCount) } +} - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - var err error - var result []*generics.Tag[any] - - switch tt.name { - case "string tags": - strTags, err := GetFileEntryTagsByType[string](fe) - if err == nil { - result = make([]*generics.Tag[any], len(strTags)) - for i, t := range strTags { - result[i] = generics.NewTag[any](t.Key, t.Value, t.Type) - } - } - case "integer tags": - // Note: After JSON round-trip, integers may be float64, so we check by TagValueType - allTags, err := GetFileEntryTags(fe) - if err == nil { - for _, tag := range allTags { - if tag.Type == generics.TagValueTypeInteger { - result = append(result, tag) - } - } - } - case "boolean tags": - boolTags, err := GetFileEntryTagsByType[bool](fe) - if err == nil { - result = make([]*generics.Tag[any], len(boolTags)) - for i, t := range boolTags { - result[i] = generics.NewTag[any](t.Key, t.Value, t.Type) - } - } - case "float tags": - // Note: After JSON round-trip, both integers and floats may be float64 - // So we check by TagValueType to distinguish - allTags, err := GetFileEntryTags(fe) - if err == nil { - for _, tag := range allTags { - if tag.Type == generics.TagValueTypeFloat { - result = append(result, tag) - } - } - } - } +type getFileEntryTagsByTypeCase struct { + name string + wantCount int + getLen func(*FileEntry) (int, error) +} - if (err != nil) != tt.wantErr { - t.Errorf("GetFileEntryTagsByType() error = %v, wantErr %v", err, tt.wantErr) +func runGetFileEntryTagsByTypeTable(t *testing.T, fe *FileEntry, cases []getFileEntryTagsByTypeCase) { + t.Helper() + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + n, err := c.getLen(fe) + if err != nil { + t.Errorf("GetFileEntryTagsByType() error = %v", err) return } - - if len(result) != tt.wantCount { - t.Errorf("GetFileEntryTagsByType() returned %d tags, want %d", len(result), tt.wantCount) + if n != c.wantCount { + t.Errorf("GetFileEntryTagsByType() returned %d tags, want %d", n, c.wantCount) } }) } } +// TestGetFileEntryTagsByType tests GetFileEntryTagsByType function +func TestGetFileEntryTagsByType(t *testing.T) { + fe := fileEntryWithTagsForGetByType(t) + runGetFileEntryTagsByTypeTable(t, fe, []getFileEntryTagsByTypeCase{ + {"string tags", 2, func(f *FileEntry) (int, error) { + tags, err := GetFileEntryTagsByType[string](f) + if err != nil { + return 0, err + } + return len(tags), nil + }}, + {"boolean tags", 1, func(f *FileEntry) (int, error) { + tags, err := GetFileEntryTagsByType[bool](f) + if err != nil { + return 0, err + } + return len(tags), nil + }}, + }) + t.Run("integer tags", func(t *testing.T) { runGetFileEntryTagsByTypeCount(t, fe, generics.TagValueTypeInteger, 2) }) + t.Run("float tags", func(t *testing.T) { runGetFileEntryTagsByTypeCount(t, fe, generics.TagValueTypeFloat, 1) }) +} + // TestGetFileEntryTag tests GetFileEntryTag function +// +//nolint:gocognit // table-driven get-tag cases func TestGetFileEntryTag(t *testing.T) { fe := NewFileEntry() tags := []*generics.Tag[any]{ @@ -386,6 +412,8 @@ func TestGetFileEntryTag_WithAny(t *testing.T) { } // TestAddFileEntryTag tests AddFileEntryTag function +// +//nolint:gocognit // table-driven add-tag cases func TestAddFileEntryTag(t *testing.T) { tests := []struct { name string @@ -402,7 +430,7 @@ func TestAddFileEntryTag(t *testing.T) { value: "John Doe", tagType: generics.TagValueTypeString, wantErr: false, - setup: func() *FileEntry { return NewFileEntry() }, + setup: NewFileEntry, }, { name: "add integer tag", @@ -410,7 +438,7 @@ func TestAddFileEntryTag(t *testing.T) { value: int64(1), tagType: generics.TagValueTypeInteger, wantErr: false, - setup: func() *FileEntry { return NewFileEntry() }, + setup: NewFileEntry, }, { name: "add boolean tag", @@ -418,7 +446,7 @@ func TestAddFileEntryTag(t *testing.T) { value: true, tagType: generics.TagValueTypeBoolean, wantErr: false, - setup: func() *FileEntry { return NewFileEntry() }, + setup: NewFileEntry, }, { name: "duplicate key error", @@ -477,6 +505,8 @@ func TestAddFileEntryTag(t *testing.T) { } // TestSetFileEntryTag tests SetFileEntryTag function +// +//nolint:gocognit // table-driven set-tag cases func TestSetFileEntryTag(t *testing.T) { fe := NewFileEntry() _ = AddFileEntryTag(fe, "author", "John Doe", generics.TagValueTypeString) @@ -535,6 +565,8 @@ func TestSetFileEntryTag(t *testing.T) { } // TestAddFileEntryTags tests AddFileEntryTags function +// +//nolint:gocognit // table-driven add-tags cases func TestAddFileEntryTags(t *testing.T) { tests := []struct { name string @@ -550,7 +582,7 @@ func TestAddFileEntryTags(t *testing.T) { generics.NewTag[any]("key2", int64(42), generics.TagValueTypeInteger), }, wantErr: false, - setup: func() *FileEntry { return NewFileEntry() }, + setup: NewFileEntry, }, { name: "duplicate key error", @@ -560,7 +592,7 @@ func TestAddFileEntryTags(t *testing.T) { }, wantErr: true, errType: pkgerrors.ErrTypeValidation, - setup: func() *FileEntry { return NewFileEntry() }, + setup: NewFileEntry, }, { name: "duplicate with existing tag", @@ -608,6 +640,8 @@ func TestAddFileEntryTags(t *testing.T) { } // TestSetFileEntryTags tests SetFileEntryTags function +// +//nolint:gocognit // table-driven set-tags cases func TestSetFileEntryTags(t *testing.T) { fe := NewFileEntry() _ = AddFileEntryTag(fe, "key1", "value1", generics.TagValueTypeString) @@ -658,6 +692,8 @@ func TestSetFileEntryTags(t *testing.T) { } // TestRemoveFileEntryTag tests RemoveFileEntryTag function +// +//nolint:gocognit // table-driven remove-tag cases func TestRemoveFileEntryTag(t *testing.T) { fe := NewFileEntry() _ = AddFileEntryTag(fe, "key1", "value1", generics.TagValueTypeString) @@ -736,30 +772,16 @@ func TestHasFileEntryTag(t *testing.T) { want: false, }, { - name: "empty file entry", - setup: func() *FileEntry { - return NewFileEntry() - }, - key: "anykey", - want: false, + name: "empty file entry", + setup: NewFileEntry, + key: "anykey", + want: false, }, { - name: "tag with error in getTagsFromOptionalData", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - key: "anykey", - want: false, // Should return false on error + name: "tag with error in getTagsFromOptionalData", + setup: fileEntryWithCorruptedTagsData, + key: "anykey", + want: false, // Should return false on error }, } @@ -791,11 +813,9 @@ func TestHasFileEntryTags(t *testing.T) { want: true, }, { - name: "no tags", - setup: func() *FileEntry { - return NewFileEntry() - }, - want: false, + name: "no tags", + setup: NewFileEntry, + want: false, }, { name: "multiple tags", @@ -808,21 +828,9 @@ func TestHasFileEntryTags(t *testing.T) { want: true, }, { - name: "tag with error in getTagsFromOptionalData", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - want: false, // Should return false on error + name: "tag with error in getTagsFromOptionalData", + setup: fileEntryWithCorruptedTagsData, + want: false, // Should return false on error }, } @@ -854,10 +862,8 @@ func TestSyncFileEntryTags(t *testing.T) { wantErr: false, }, { - name: "sync with no tags", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "sync with no tags", + setup: NewFileEntry, wantErr: false, }, { @@ -979,12 +985,7 @@ func TestSyncTagsToOptionalData_EdgeCases(t *testing.T) { // TestGetFileEntryEffectiveTags tests GetFileEntryEffectiveTags function func TestGetFileEntryEffectiveTags(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - wantCount int - wantErr bool - }{ + tests := []getFileEntryTagsCase{ { name: "file with tags", setup: func() *FileEntry { @@ -996,10 +997,8 @@ func TestGetFileEntryEffectiveTags(t *testing.T) { wantErr: false, }, { - name: "file with no tags", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "file with no tags", + setup: NewFileEntry, wantCount: 0, wantErr: false, }, @@ -1060,50 +1059,12 @@ func TestGetFileEntryEffectiveTags(t *testing.T) { wantErr: false, }, } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - effectiveTags, err := GetFileEntryEffectiveTags(fe) - - if (err != nil) != tt.wantErr { - t.Errorf("GetFileEntryEffectiveTags() error = %v, wantErr %v", err, tt.wantErr) - return - } - - if len(effectiveTags) != tt.wantCount { - t.Errorf("GetFileEntryEffectiveTags() returned %d tags, want %d", len(effectiveTags), tt.wantCount) - } - }) - } + runTagsCountTable(t, tests, GetFileEntryEffectiveTags, "GetFileEntryEffectiveTags") } // TestGetFileEntryInheritedTags tests GetFileEntryInheritedTags function func TestGetFileEntryInheritedTags(t *testing.T) { - // Create hierarchy: root -> parent -> child - root := &PathMetadataEntry{ - Path: generics.PathEntry{PathLength: 1, Path: "/"}, - Type: PathMetadataTypeDirectory, - Inheritance: &PathInheritance{ - Enabled: true, - Priority: 1, - }, - Properties: []*generics.Tag[any]{ - {Key: "root-tag", Value: "root-value", Type: generics.TagValueTypeString}, - }, - } - - parent := &PathMetadataEntry{ - Path: generics.PathEntry{PathLength: 4, Path: "dir"}, - Type: PathMetadataTypeDirectory, - Inheritance: &PathInheritance{ - Enabled: true, - Priority: 2, - }, - Properties: []*generics.Tag[any]{ - {Key: "parent-tag", Value: "parent-value", Type: generics.TagValueTypeString}, - }, - } + root, parent := pathMetadataRootParentFixture() parent.SetParentPath(root) fe := NewFileEntry() @@ -1119,17 +1080,10 @@ func TestGetFileEntryInheritedTags(t *testing.T) { _ = fe.AssociateWithPathMetadata(child) - tests := []struct { - name string - setup func() *FileEntry - wantCount int - wantErr bool - }{ + tests := []getFileEntryTagsCase{ { - name: "file with no inheritance", - setup: func() *FileEntry { - return NewFileEntry() - }, + name: "file with no inheritance", + setup: NewFileEntry, wantCount: 0, wantErr: false, }, @@ -1152,23 +1106,8 @@ func TestGetFileEntryInheritedTags(t *testing.T) { wantErr: false, }, } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - inheritedTags, err := GetFileEntryInheritedTags(fe) - - if (err != nil) != tt.wantErr { - t.Errorf("GetFileEntryInheritedTags() error = %v, wantErr %v", err, tt.wantErr) - return - } - - if len(inheritedTags) != tt.wantCount { - t.Errorf("GetFileEntryInheritedTags() returned %d tags, want %d", len(inheritedTags), tt.wantCount) - } - }) - } -} + runTagsCountTable(t, tests, GetFileEntryInheritedTags, "GetFileEntryInheritedTags") +} // TestFileEntryTagOperations_AllValueTypes tests tag operations with all TagValueType values func TestFileEntryTagOperations_AllValueTypes(t *testing.T) { @@ -1231,326 +1170,86 @@ func TestFileEntryTagOperations_AllValueTypes(t *testing.T) { // TestGetFileEntryTags_CorruptionScenarios tests various corruption scenarios func TestGetFileEntryTags_CorruptionScenarios(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - wantCount int - wantErr bool - }{ + tests := []getFileEntryTagsCase{ { - name: "all tags corrupted but array structure valid", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create array with tags that have invalid structure - // Use tags that will unmarshal but have zero ValueType (which is invalid) - corruptedArray := []byte(`[{"Key":"tag1","ValueType":0,"Value":null},{"Key":"tag2","ValueType":0,"Value":null}]`) - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(corruptedArray)), - Data: corruptedArray, - }, - } - fe.updateOptionalDataLen() - return fe - }, - wantCount: 2, // Tags unmarshal successfully (zero values are valid) + name: "all tags corrupted but array structure valid", + setup: fileEntryWithCorruptedArrayValidStructure, + wantCount: 2, wantErr: false, }, { - name: "partial corruption with some valid tags", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create array with mix of valid and corrupted tags - // Use the same approach as existing "array with corrupted individual tag" test - // Create valid tag first - validTag := generics.NewTag[any]("valid", "value", generics.TagValueTypeString) - tags := []*generics.Tag[any]{validTag} - tagData, _ := json.Marshal(tags) - // Create array with valid tag followed by corrupted tag - // Use the same approach as existing "array with corrupted individual tag" test - // The corrupted tag {"invalid":} is invalid JSON, so it will fail array parsing - // This tests the path where array structure itself is invalid - partialCorruption := []byte(`[`) - partialCorruption = append(partialCorruption, tagData[1:len(tagData)-1]...) // Remove outer brackets - partialCorruption = append(partialCorruption, []byte(`,{"invalid":}]`)...) - partialCorruption = append(partialCorruption, []byte(`]`)...) - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(partialCorruption)), - Data: partialCorruption, - }, - } - fe.updateOptionalDataLen() - return fe - }, - wantCount: 0, // Array structure invalid, so no tags parsed - wantErr: true, // But report corruption error + name: "partial corruption with some valid tags", + setup: fileEntryWithPartialCorruptionInvalidArray, + wantCount: 0, + wantErr: true, }, { - name: "all tags corrupted - entry removed", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create array with all tags corrupted (invalid JSON objects) - // Use valid array structure but with tags that fail individual unmarshaling - allCorrupted := []byte(`[{"invalid":},{"bad":}]`) - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(allCorrupted)), - Data: allCorrupted, - }, - } - fe.updateOptionalDataLen() - return fe - }, - wantCount: 0, // No valid tags, entry should be removed - wantErr: true, // Corruption error should be returned + name: "all tags corrupted - entry removed", + setup: fileEntryWithAllCorruptedTags, + wantCount: 0, + wantErr: true, }, { - name: "tags with invalid ValueType greater than maximum", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create array with tags that have valid JSON structure but invalid ValueType - // ValueType 0x11 (17) is greater than TagValueTypeNovusPackMetadata (0x10) - invalidValueType := []byte(`[{"Key":"tag1","ValueType":17,"Value":"value1"},{"Key":"tag2","ValueType":255,"Value":"value2"}]`) - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: uint16(len(invalidValueType)), - Data: invalidValueType, - }, - } - fe.updateOptionalDataLen() - return fe - }, - wantCount: 0, // Invalid tags should be skipped - wantErr: true, // Corruption error should be returned + name: "tags with invalid ValueType greater than maximum", + setup: fileEntryWithInvalidValueTypeTags, + wantCount: 0, + wantErr: true, }, } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - tags, err := GetFileEntryTags(fe) - - if (err != nil) != tt.wantErr { - t.Errorf("GetFileEntryTags() error = %v, wantErr %v", err, tt.wantErr) - if err != nil { - t.Errorf("Error details: %+v", err) - } - return - } - - if len(tags) != tt.wantCount { - t.Errorf("GetFileEntryTags() returned %d tags, want %d", len(tags), tt.wantCount) - } - }) - } + runGetFileEntryTagsTable(t, tests) } -// TestAddFileEntryTag_ErrorHandling tests error handling in AddFileEntryTag -func TestAddFileEntryTag_ErrorHandling(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - key string - value string - wantErr bool - }{ +// TestAddAndSetFileEntryTag_ErrorHandling covers error handling for Add and Set (same getTagsFromOptionalData path). +func TestAddAndSetFileEntryTag_ErrorHandling(t *testing.T) { + runTagErrorTable(t, []tagErrorCase{ { - name: "error from getTagsFromOptionalData", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - key: "test", - value: "value", + name: "AddFileEntryTag: error from getTagsFromOptionalData", + setup: fileEntryWithCorruptedTagsData, + run: func(fe *FileEntry) error { return AddFileEntryTag(fe, "test", "value", generics.TagValueTypeString) }, wantErr: true, }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - err := AddFileEntryTag(fe, tt.key, tt.value, generics.TagValueTypeString) - - if (err != nil) != tt.wantErr { - t.Errorf("AddFileEntryTag() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } -} - -// TestSetFileEntryTag_ErrorHandling tests error handling in SetFileEntryTag -func TestSetFileEntryTag_ErrorHandling(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - key string - value string - wantErr bool - }{ { - name: "error from getTagsFromOptionalData", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - key: "test", - value: "value", + name: "SetFileEntryTag: error from getTagsFromOptionalData", + setup: fileEntryWithCorruptedTagsData, + run: func(fe *FileEntry) error { return SetFileEntryTag(fe, "test", "value", generics.TagValueTypeString) }, wantErr: true, }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - err := SetFileEntryTag(fe, tt.key, tt.value, generics.TagValueTypeString) - - if (err != nil) != tt.wantErr { - t.Errorf("SetFileEntryTag() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + }) } // TestRemoveFileEntryTag_ErrorHandling tests error handling in RemoveFileEntryTag func TestRemoveFileEntryTag_ErrorHandling(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - key string - wantErr bool - }{ - { - name: "error from getTagsFromOptionalData", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - key: "test", - wantErr: true, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - err := RemoveFileEntryTag(fe, tt.key) - - if (err != nil) != tt.wantErr { - t.Errorf("RemoveFileEntryTag() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + runTagErrorTable(t, []tagErrorCase{{ + name: "error from getTagsFromOptionalData", + setup: fileEntryWithCorruptedTagsData, + run: func(fe *FileEntry) error { return RemoveFileEntryTag(fe, "test") }, + wantErr: true, + }}) } // TestGetFileEntryEffectiveTags_ErrorHandling tests error handling in GetFileEntryEffectiveTags func TestGetFileEntryEffectiveTags_ErrorHandling(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - wantErr bool - }{ - { - name: "error from GetFileEntryTags", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - wantErr: true, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - _, err := GetFileEntryEffectiveTags(fe) - - if (err != nil) != tt.wantErr { - t.Errorf("GetFileEntryEffectiveTags() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + runTagErrorTable(t, []tagErrorCase{{ + name: "error from GetFileEntryTags", + setup: fileEntryWithCorruptedTagsData, + run: func(fe *FileEntry) error { _, err := GetFileEntryEffectiveTags(fe); return err }, + wantErr: true, + }}) } // TestGetFileEntryTagsByType_ErrorHandling tests error handling in GetFileEntryTagsByType func TestGetFileEntryTagsByType_ErrorHandling(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - wantErr bool - }{ - { - name: "error from GetFileEntryTags", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - wantErr: true, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - _, err := GetFileEntryTagsByType[string](fe) - - if (err != nil) != tt.wantErr { - t.Errorf("GetFileEntryTagsByType() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + runTagErrorTable(t, []tagErrorCase{{ + name: "error from GetFileEntryTags", + setup: fileEntryWithCorruptedTagsData, + run: func(fe *FileEntry) error { _, err := GetFileEntryTagsByType[string](fe); return err }, + wantErr: true, + }}) } // TestGetFileEntryTag_ErrorHandling tests error handling in GetFileEntryTag +// +//nolint:gocognit // table-driven error-handling cases func TestGetFileEntryTag_ErrorHandling(t *testing.T) { tests := []struct { name string @@ -1560,20 +1259,8 @@ func TestGetFileEntryTag_ErrorHandling(t *testing.T) { errType pkgerrors.ErrorType }{ { - name: "error from getTagsFromOptionalData", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, + name: "error from getTagsFromOptionalData", + setup: fileEntryWithCorruptedTagsData, key: "test", wantErr: true, errType: pkgerrors.ErrTypeCorruption, @@ -1620,86 +1307,20 @@ func TestGetFileEntryTag_ErrorHandling(t *testing.T) { } } +var fileEntryTagsErrorCase = []*generics.Tag[any]{generics.NewTag[any]("key1", "value1", generics.TagValueTypeString)} + // TestAddFileEntryTags_ErrorHandling tests error handling in AddFileEntryTags func TestAddFileEntryTags_ErrorHandling(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - tags []*generics.Tag[any] - wantErr bool - }{ - { - name: "error from getTagsFromOptionalData", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - tags: []*generics.Tag[any]{ - generics.NewTag[any]("key1", "value1", generics.TagValueTypeString), - }, - wantErr: true, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - err := AddFileEntryTags(fe, tt.tags) - - if (err != nil) != tt.wantErr { - t.Errorf("AddFileEntryTags() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + runTagErrorTable(t, []tagErrorCase{{ + name: "error from getTagsFromOptionalData", setup: fileEntryWithCorruptedTagsData, + run: func(fe *FileEntry) error { return AddFileEntryTags(fe, fileEntryTagsErrorCase) }, wantErr: true, + }}) } // TestSetFileEntryTags_ErrorHandling tests error handling in SetFileEntryTags func TestSetFileEntryTags_ErrorHandling(t *testing.T) { - tests := []struct { - name string - setup func() *FileEntry - tags []*generics.Tag[any] - wantErr bool - }{ - { - name: "error from getTagsFromOptionalData", - setup: func() *FileEntry { - fe := NewFileEntry() - // Create corrupted optional data - fe.OptionalData = []OptionalDataEntry{ - { - DataType: OptionalDataTagsData, - DataLength: 10, - Data: []byte("invalid json"), - }, - } - fe.updateOptionalDataLen() - return fe - }, - tags: []*generics.Tag[any]{ - generics.NewTag[any]("key1", "value1", generics.TagValueTypeString), - }, - wantErr: true, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - fe := tt.setup() - err := SetFileEntryTags(fe, tt.tags) - - if (err != nil) != tt.wantErr { - t.Errorf("SetFileEntryTags() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + runTagErrorTable(t, []tagErrorCase{{ + name: "error from getTagsFromOptionalData", setup: fileEntryWithCorruptedTagsData, + run: func(fe *FileEntry) error { return SetFileEntryTags(fe, fileEntryTagsErrorCase) }, wantErr: true, + }}) } diff --git a/api/go/metadata/fileentry_test.go b/api/go/metadata/fileentry_test.go index 01b59e88..844f88ec 100644 --- a/api/go/metadata/fileentry_test.go +++ b/api/go/metadata/fileentry_test.go @@ -12,30 +12,38 @@ import ( "github.com/novus-engine/novuspack/api/go/internal/testhelpers" ) -// TestFileEntryFixedSize verifies the fixed section is exactly 64 bytes -func TestFileEntryFixedSize(t *testing.T) { - // Create a minimal FileEntry with only fixed fields - type FileEntryFixed struct { - FileID uint64 - OriginalSize uint64 - StoredSize uint64 - RawChecksum uint32 - StoredChecksum uint32 - FileVersion uint32 - MetadataVersion uint32 - PathCount uint16 - Type uint16 - CompressionType uint8 - CompressionLevel uint8 - EncryptionType uint8 - HashCount uint8 - HashDataOffset uint32 - HashDataLen uint16 - OptionalDataLen uint16 - OptionalDataOffset uint32 - Reserved uint32 +var fileEntryTestPath = []generics.PathEntry{{PathLength: 8, Path: "test.txt"}} + +func fileEntryWithPathAndHashes(hashDataLen int) FileEntry { + return FileEntry{ + FileID: 1, + PathCount: 1, + Paths: fileEntryTestPath, + HashCount: 1, + Hashes: []HashEntry{ + { + HashType: fileformat.HashTypeSHA256, + HashPurpose: fileformat.HashPurposeContentVerification, + HashLength: 32, + HashData: make([]byte, hashDataLen), + }, + }, + } +} + +func fileEntryWithPathAndOptionalData(dataLen uint16, dataBytes int) FileEntry { + return FileEntry{ + FileID: 1, + PathCount: 1, + Paths: fileEntryTestPath, + OptionalData: []OptionalDataEntry{ + {DataType: OptionalDataTagsData, DataLength: dataLen, Data: make([]byte, dataBytes)}, + }, } +} +// TestFileEntryFixedSize verifies the fixed section is exactly 64 bytes +func TestFileEntryFixedSize(t *testing.T) { var fixed FileEntryFixed size := binary.Size(fixed) @@ -100,56 +108,10 @@ func TestFileEntryValidation(t *testing.T) { }, true, }, - { - "Invalid hash entry", - FileEntry{ - FileID: 1, - PathCount: 1, - Paths: []generics.PathEntry{{PathLength: 8, Path: "test.txt"}}, - HashCount: 1, - Hashes: []HashEntry{ - {HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, HashLength: 32, HashData: make([]byte, 16)}, - }, - }, - true, - }, - { - "Invalid optional data entry", - FileEntry{ - FileID: 1, - PathCount: 1, - Paths: []generics.PathEntry{{PathLength: 8, Path: "test.txt"}}, - OptionalData: []OptionalDataEntry{ - {DataType: OptionalDataTagsData, DataLength: 10, Data: make([]byte, 5)}, - }, - }, - true, - }, - { - "Valid entry with hashes", - FileEntry{ - FileID: 1, - PathCount: 1, - Paths: []generics.PathEntry{{PathLength: 8, Path: "test.txt"}}, - HashCount: 1, - Hashes: []HashEntry{ - {HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, HashLength: 32, HashData: make([]byte, 32)}, - }, - }, - false, - }, - { - "Valid entry with optional data", - FileEntry{ - FileID: 1, - PathCount: 1, - Paths: []generics.PathEntry{{PathLength: 8, Path: "test.txt"}}, - OptionalData: []OptionalDataEntry{ - {DataType: OptionalDataTagsData, DataLength: 10, Data: make([]byte, 10)}, - }, - }, - false, - }, + {"Invalid hash entry", fileEntryWithPathAndHashes(16), true}, + {"Invalid optional data entry", fileEntryWithPathAndOptionalData(10, 5), true}, + {"Valid entry with hashes", fileEntryWithPathAndHashes(32), false}, + {"Valid entry with optional data", fileEntryWithPathAndOptionalData(10, 10), false}, { "Valid entry with hashes and optional data", FileEntry{ @@ -168,14 +130,11 @@ func TestFileEntryValidation(t *testing.T) { }, } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - err := tt.entry.Validate() - if (err != nil) != tt.wantErr { - t.Errorf("Validate() error = %v, wantErr %v", err, tt.wantErr) - } - }) + cases := make([]validateCase, len(tests)) + for i := range tests { + cases[i] = validateCase{name: tests[i].name, subject: &tests[i].entry, wantErr: tests[i].wantErr} } + runValidateTable(t, cases) } // TestFileEntrySizeCalculation verifies size calculations @@ -276,6 +235,8 @@ func TestFileEntryValidationInvalidPath(t *testing.T) { } // TestNewFileEntry verifies NewFileEntry initializes correctly +// +//nolint:gocognit,gocyclo // table-driven init cases func TestNewFileEntry(t *testing.T) { entry := NewFileEntry() @@ -390,32 +351,18 @@ func TestFileEntryReadFromFixedOnly(t *testing.T) { } } +func fileEntryWithTestPathsAndHashes() FileEntry { + return FileEntry{ + FileID: 1, OriginalSize: 1000, StoredSize: 800, FileVersion: 1, MetadataVersion: 1, + PathCount: 1, HashCount: 1, Reserved: 0, + Paths: []generics.PathEntry{{PathLength: 8, Path: "test.txt"}}, + Hashes: []HashEntry{{HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, HashLength: 32, HashData: make([]byte, 32)}}, + } +} + // TestFileEntryReadFromWithVariableData verifies ReadFrom with variable-length data func TestFileEntryReadFromWithVariableData(t *testing.T) { - entry := FileEntry{ - FileID: 1, - OriginalSize: 1000, - StoredSize: 800, - FileVersion: 1, - MetadataVersion: 1, - PathCount: 1, - HashCount: 1, - Reserved: 0, - Paths: []generics.PathEntry{ - { - PathLength: 8, - Path: "test.txt", - }, - }, - Hashes: []HashEntry{ - { - HashType: fileformat.HashTypeSHA256, - HashPurpose: fileformat.HashPurposeContentVerification, - HashLength: 32, - HashData: make([]byte, 32), - }, - }, - } + entry := fileEntryWithTestPathsAndHashes() // Update PathLength to match actual path length for i := range entry.Paths { @@ -453,6 +400,22 @@ func TestFileEntryReadFromWithVariableData(t *testing.T) { } } +// runReadFromIncompleteExpectError runs subtests that expect ReadFrom to return an error for incomplete data. +func runReadFromIncompleteExpectError(t *testing.T, tests []struct { + name string + data []byte +}, readFrom func([]byte) error) { + t.Helper() + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + err := readFrom(tt.data) + if err == nil { + t.Errorf("ReadFrom() expected error for incomplete data, got nil") + } + }) + } +} + // TestFileEntryReadFromIncompleteData verifies ReadFrom handles incomplete data func TestFileEntryReadFromIncompleteData(t *testing.T) { tests := []struct { @@ -463,18 +426,11 @@ func TestFileEntryReadFromIncompleteData(t *testing.T) { {"Partial fixed", make([]byte, 32)}, {"Almost complete fixed", make([]byte, 63)}, } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - var entry FileEntry - r := bytes.NewReader(tt.data) - _, err := entry.ReadFrom(r) - - if err == nil { - t.Errorf("ReadFrom() expected error for incomplete data, got nil") - } - }) - } + runReadFromIncompleteExpectError(t, tests, func(data []byte) error { + var entry FileEntry + _, err := entry.ReadFrom(bytes.NewReader(data)) + return err + }) } // TestFileEntryWriteToFixedOnly verifies WriteTo with fixed section and data @@ -522,30 +478,7 @@ func TestFileEntryWriteToFixedOnly(t *testing.T) { // TestFileEntryWriteToWithVariableData verifies WriteTo with variable-length data func TestFileEntryWriteToWithVariableData(t *testing.T) { - entry := FileEntry{ - FileID: 1, - OriginalSize: 1000, - StoredSize: 800, - FileVersion: 1, - MetadataVersion: 1, - PathCount: 1, - HashCount: 1, - Reserved: 0, - Paths: []generics.PathEntry{ - { - PathLength: 8, - Path: "test.txt", - }, - }, - Hashes: []HashEntry{ - { - HashType: fileformat.HashTypeSHA256, - HashPurpose: fileformat.HashPurposeContentVerification, - HashLength: 32, - HashData: make([]byte, 32), - }, - }, - } + entry := fileEntryWithTestPathsAndHashes() // Update counts entry.PathCount = uint16(len(entry.Paths)) @@ -647,6 +580,8 @@ func TestFileEntryRoundTrip(t *testing.T) { } // TestFileEntryWriteToErrorPaths verifies WriteTo error handling +// +//nolint:gocognit // table-driven error paths func TestFileEntryWriteToErrorPaths(t *testing.T) { tests := []struct { name string @@ -742,6 +677,8 @@ func TestFileEntryWriteToErrorPaths(t *testing.T) { } // TestFileEntryReadFromErrorPaths verifies ReadFrom error handling for edge cases +// +//nolint:gocognit // table-driven error paths func TestFileEntryReadFromErrorPaths(t *testing.T) { tests := []struct { name string @@ -882,7 +819,7 @@ func TestFileEntryReadFromErrorPaths(t *testing.T) { padding := make([]byte, 100-10) buf.Write(padding) // Write hash - if _, err := entry.Hashes[0].WriteTo(&buf); err != nil { + if _, err := entry.Hashes[0].writeTo(&buf); err != nil { panic(err) } return bytes.NewReader(buf.Bytes()) @@ -985,7 +922,7 @@ func TestFileEntryReadFromErrorPaths(t *testing.T) { } // Write hash (36 bytes) hashEntry := HashEntry{HashType: fileformat.HashTypeSHA256, HashLength: 32, HashData: make([]byte, 32)} - if _, err := hashEntry.WriteTo(&buf); err != nil { + if _, err := hashEntry.writeTo(&buf); err != nil { panic(err) } // Only write 50 bytes of padding (need more to reach optional data at offset 300, but only write 50) diff --git a/api/go/metadata/hashentry.go b/api/go/metadata/hashentry.go index 6f3c5014..c9e917b8 100644 --- a/api/go/metadata/hashentry.go +++ b/api/go/metadata/hashentry.go @@ -10,7 +10,6 @@ package metadata import ( "encoding/binary" - "fmt" "io" "github.com/novus-engine/novuspack/api/go/pkgerrors" @@ -47,30 +46,14 @@ type HashEntry struct { // - HashType must be valid // // Returns an error if any validation check fails. -func (h *HashEntry) Validate() error { - if len(h.HashData) == 0 { - return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "hash data cannot be nil or empty", nil, pkgerrors.ValidationErrorContext{ - Field: "HashData", - Value: nil, - Expected: "non-empty hash data", - }) - } - - if uint16(len(h.HashData)) != h.HashLength { - return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "hash length mismatch", nil, pkgerrors.ValidationErrorContext{ - Field: "HashLength", - Value: h.HashLength, - Expected: fmt.Sprintf("%d", len(h.HashData)), - }) - } - - return nil +func (h *HashEntry) validate() error { + return validateSliceLength(len(h.HashData), h.HashLength, "HashData", "hash data cannot be nil or empty", "non-empty hash data") } // Size returns the total size of the HashEntry in bytes. // // Specification: package_file_format.md: 4.1.4.3 Hash Data -func (h *HashEntry) Size() int { +func (h HashEntry) size() int { return 4 + int(h.HashLength) // Type(1) + Purpose(1) + Length(2) + Data } @@ -85,7 +68,7 @@ func (h *HashEntry) Size() int { // Returns the number of bytes read and any error encountered. // // Specification: package_file_format.md: 4.1.4.3 Hash Data -func (h *HashEntry) ReadFrom(r io.Reader) (int64, error) { +func (h *HashEntry) readFrom(r io.Reader) (int64, error) { var totalRead int64 // Read HashType (1 byte) @@ -125,28 +108,12 @@ func (h *HashEntry) ReadFrom(r io.Reader) (int64, error) { h.HashLength = hashLength // Read HashData (HashLength bytes) - if hashLength > 0 { - hashData := make([]byte, hashLength) - n, err := io.ReadFull(r, hashData) - if err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read hash data", pkgerrors.ValidationErrorContext{ - Field: "HashData", - Value: hashLength, - Expected: "hash data", - }) - } - if uint16(n) != hashLength { - return totalRead, pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "incomplete hash data read", nil, pkgerrors.ValidationErrorContext{ - Field: "HashData", - Value: n, - Expected: fmt.Sprintf("%d bytes", hashLength), - }) - } - totalRead += int64(n) - h.HashData = hashData - } else { - h.HashData = nil + hashData, n, err := readLengthPrefixedBytes(r, hashLength, "HashData", "hash data") + if err != nil { + return totalRead, err } + totalRead += n + h.HashData = hashData return totalRead, nil } @@ -162,7 +129,7 @@ func (h *HashEntry) ReadFrom(r io.Reader) (int64, error) { // Returns the number of bytes written and any error encountered. // // Specification: package_file_format.md: 4.1.4.3 Hash Data -func (h *HashEntry) WriteTo(w io.Writer) (int64, error) { +func (h *HashEntry) writeTo(w io.Writer) (int64, error) { var totalWritten int64 // Write HashType (1 byte) @@ -196,31 +163,11 @@ func (h *HashEntry) WriteTo(w io.Writer) (int64, error) { totalWritten += 2 // Write HashData (HashLength bytes) - if h.HashLength > 0 { - if uint16(len(h.HashData)) != h.HashLength { - return totalWritten, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "hash length mismatch", nil, pkgerrors.ValidationErrorContext{ - Field: "HashLength", - Value: len(h.HashData), - Expected: fmt.Sprintf("%d", h.HashLength), - }) - } - n, err := w.Write(h.HashData) - if err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write hash data", pkgerrors.ValidationErrorContext{ - Field: "HashData", - Value: h.HashData, - Expected: "written successfully", - }) - } - if uint16(n) != h.HashLength { - return totalWritten, pkgerrors.NewPackageError(pkgerrors.ErrTypeIO, "incomplete hash data write", nil, pkgerrors.ValidationErrorContext{ - Field: "HashData", - Value: n, - Expected: fmt.Sprintf("%d bytes", h.HashLength), - }) - } - totalWritten += int64(n) + n, err := writeLengthPrefixedBytes(w, h.HashData, h.HashLength, "HashData") + if err != nil { + return totalWritten, err } + totalWritten += n return totalWritten, nil } diff --git a/api/go/metadata/hashentry_test.go b/api/go/metadata/hashentry_test.go index 1778620e..66e908af 100644 --- a/api/go/metadata/hashentry_test.go +++ b/api/go/metadata/hashentry_test.go @@ -2,7 +2,6 @@ package metadata import ( "bytes" - "io" "testing" "github.com/novus-engine/novuspack/api/go/fileformat" @@ -11,142 +10,44 @@ import ( // TestHashEntry_WriteTo tests the WriteTo method. func TestHashEntry_WriteTo(t *testing.T) { - tests := []struct { - name string - entry HashEntry - wantErr bool - }{ - { - name: "valid hash entry", - entry: HashEntry{ - HashType: fileformat.HashTypeSHA256, - HashPurpose: fileformat.HashPurposeContentVerification, - HashLength: 32, - HashData: make([]byte, 32), - }, - wantErr: false, - }, - { - name: "empty hash data", - entry: HashEntry{ - HashType: fileformat.HashTypeSHA256, - HashPurpose: fileformat.HashPurposeContentVerification, - HashLength: 0, - HashData: []byte{}, - }, - wantErr: false, - }, - { - name: "hash length mismatch", - entry: HashEntry{ - HashType: fileformat.HashTypeSHA256, - HashPurpose: fileformat.HashPurposeContentVerification, - HashLength: 32, - HashData: make([]byte, 16), // Mismatch - }, - wantErr: true, - }, - { - name: "empty hash length", - entry: HashEntry{ - HashType: fileformat.HashTypeSHA256, - HashPurpose: fileformat.HashPurposeContentVerification, - HashLength: 0, - HashData: []byte{}, - }, - wantErr: false, - }, - { - name: "incomplete write", - entry: HashEntry{ - HashType: fileformat.HashTypeSHA256, - HashPurpose: fileformat.HashPurposeContentVerification, - HashLength: 32, - HashData: make([]byte, 32), - }, - wantErr: false, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - var buf bytes.Buffer - n, err := tt.entry.WriteTo(&buf) - - if (err != nil) != tt.wantErr { - t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) - return - } - - if !tt.wantErr { - if n == 0 { - t.Error("WriteTo() wrote 0 bytes") - } - - // Verify minimum size: HashType (1) + HashPurpose (1) + HashLength (2) + HashData - minSize := int64(4 + len(tt.entry.HashData)) - if n < minSize { - t.Errorf("WriteTo() wrote %d bytes, want at least %d", n, minSize) - } - } - }) - } + runWriteToEntryTable(t, []writeToCase{ + {"valid hash entry", &HashEntry{ + HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, + HashLength: 32, HashData: make([]byte, 32), + }, false, 36}, + {"empty hash data", &HashEntry{ + HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, + HashLength: 0, HashData: []byte{}, + }, false, 4}, + {"hash length mismatch", &HashEntry{ + HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, + HashLength: 32, HashData: make([]byte, 16), + }, true, 0}, + {"empty hash length", &HashEntry{ + HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, + HashLength: 0, HashData: []byte{}, + }, false, 4}, + {"incomplete write", &HashEntry{ + HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, + HashLength: 32, HashData: make([]byte, 32), + }, false, 36}, + }) } // TestHashEntry_WriteTo_ErrorPaths tests error paths in WriteTo method. func TestHashEntry_WriteTo_ErrorPaths(t *testing.T) { - entry := HashEntry{ + entry := &HashEntry{ HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, HashLength: 32, HashData: make([]byte, 32), } - - tests := []struct { - name string - writer io.Writer - wantErr bool - }{ - { - name: "write error on HashType", - writer: testhelpers.NewErrorWriter(), - wantErr: true, - }, - { - name: "write error on HashPurpose", - writer: testhelpers.NewFailingWriter(1), // Fails after writing HashType - wantErr: true, - }, - { - name: "write error on HashLength", - writer: testhelpers.NewFailingWriter(2), // Fails after writing HashType and HashPurpose - wantErr: true, - }, - { - name: "write error on HashData", - writer: testhelpers.NewFailingWriter(4), // Fails after writing HashType, HashPurpose, and HashLength - wantErr: true, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - _, err := entry.WriteTo(tt.writer) - - if (err != nil) != tt.wantErr { - t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) - return - } - - if tt.wantErr { - if err == nil { - t.Error("WriteTo() expected error but got nil") - } - // Note: errorWriter returns error immediately, so bytes written may be 0 - // failingWriter may write some bytes before failing - } - }) - } + runWriteToErrorPathsTable(t, entry, []writeToErrorCase{ + {"write error on HashType", testhelpers.NewErrorWriter(), true}, + {"write error on HashPurpose", testhelpers.NewFailingWriter(1), true}, + {"write error on HashLength", testhelpers.NewFailingWriter(2), true}, + {"write error on HashData", testhelpers.NewFailingWriter(4), true}, + }) } // TestHashEntry_ReadFrom tests the ReadFrom method. @@ -160,12 +61,12 @@ func TestHashEntry_ReadFrom(t *testing.T) { } var buf bytes.Buffer - if _, err := entry.WriteTo(&buf); err != nil { + if _, err := entry.writeTo(&buf); err != nil { t.Fatalf("Failed to write test data: %v", err) } var readEntry HashEntry - n, err := readEntry.ReadFrom(&buf) + n, err := readEntry.readFrom(&buf) if err != nil { t.Fatalf("ReadFrom() error = %v", err) @@ -194,10 +95,7 @@ func TestHashEntry_ReadFrom(t *testing.T) { // TestHashEntry_ReadFrom_IncompleteData tests error handling for incomplete data. func TestHashEntry_ReadFrom_IncompleteData(t *testing.T) { - tests := []struct { - name string - data []byte - }{ + tests := []readFromIncompleteCase{ { name: "no data", data: []byte{}, @@ -220,15 +118,5 @@ func TestHashEntry_ReadFrom_IncompleteData(t *testing.T) { }, } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - var entry HashEntry - r := bytes.NewReader(tt.data) - _, err := entry.ReadFrom(r) - - if err == nil { - t.Errorf("ReadFrom() expected error for incomplete data, got nil") - } - }) - } + runReadFromIncompleteTable(t, tests, func() readFromEntry { return &HashEntry{} }) } diff --git a/api/go/metadata/hashentry_validate_test.go b/api/go/metadata/hashentry_validate_test.go index 2b7a7751..f145abe6 100644 --- a/api/go/metadata/hashentry_validate_test.go +++ b/api/go/metadata/hashentry_validate_test.go @@ -6,61 +6,58 @@ import ( "github.com/novus-engine/novuspack/api/go/fileformat" ) +type hashEntryValidatable struct { + entry HashEntry +} + +func (h hashEntryValidatable) Validate() error { + return h.entry.validate() +} + // TestHashEntry_Validate tests the Validate method. func TestHashEntry_Validate(t *testing.T) { - tests := []struct { - name string - entry HashEntry - wantErr bool - }{ + tests := []validateCase{ { name: "valid hash entry", - entry: HashEntry{ + subject: hashEntryValidatable{entry: HashEntry{ HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, HashLength: 32, HashData: make([]byte, 32), - }, + }}, wantErr: false, }, { name: "empty hash data", - entry: HashEntry{ + subject: hashEntryValidatable{entry: HashEntry{ HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, HashLength: 0, HashData: []byte{}, - }, + }}, wantErr: true, }, { name: "hash length mismatch", - entry: HashEntry{ + subject: hashEntryValidatable{entry: HashEntry{ HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, HashLength: 32, HashData: make([]byte, 16), - }, + }}, wantErr: true, }, { name: "nil hash data", - entry: HashEntry{ + subject: hashEntryValidatable{entry: HashEntry{ HashType: fileformat.HashTypeSHA256, HashPurpose: fileformat.HashPurposeContentVerification, HashLength: 32, HashData: nil, - }, + }}, wantErr: true, }, } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - err := tt.entry.Validate() - if (err != nil) != tt.wantErr { - t.Errorf("Validate() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + runValidateTable(t, tests) } diff --git a/api/go/metadata/length_prefixed_io.go b/api/go/metadata/length_prefixed_io.go new file mode 100644 index 00000000..0b8f8056 --- /dev/null +++ b/api/go/metadata/length_prefixed_io.go @@ -0,0 +1,68 @@ +// This file provides shared read/write of length-prefixed byte slices used by +// HashEntry and OptionalDataEntry to avoid duplicate I/O logic. +// +// Specification: package_file_format.md: 4.1.4.3 Hash Data + +package metadata + +import ( + "fmt" + "io" + + "github.com/novus-engine/novuspack/api/go/pkgerrors" +) + +// readLengthPrefixedBytes reads length bytes from r into a new slice. +func readLengthPrefixedBytes(r io.Reader, length uint16, fieldName, expectedDesc string) (data []byte, n int64, err error) { + if length == 0 { + return nil, 0, nil + } + data = make([]byte, length) + var readN int + readN, err = io.ReadFull(r, data) + if err != nil { + return nil, 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read "+fieldName, pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: length, + Expected: expectedDesc, + }) + } + if uint16(readN) != length { + return nil, 0, pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "incomplete "+fieldName+" read", nil, pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: readN, + Expected: fmt.Sprintf("%d bytes", length), + }) + } + return data, int64(readN), nil +} + +// writeLengthPrefixedBytes validates length matches data then writes data to w. +func writeLengthPrefixedBytes(w io.Writer, data []byte, length uint16, fieldName string) (int64, error) { + if length == 0 { + return 0, nil + } + if uint16(len(data)) != length { + return 0, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "length mismatch", nil, pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: len(data), + Expected: fmt.Sprintf("%d", length), + }) + } + n, err := w.Write(data) + if err != nil { + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write "+fieldName, pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: data, + Expected: "written successfully", + }) + } + if uint16(n) != length { + return 0, pkgerrors.NewPackageError(pkgerrors.ErrTypeIO, "incomplete "+fieldName+" write", nil, pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: n, + Expected: fmt.Sprintf("%d bytes", length), + }) + } + return int64(n), nil +} diff --git a/api/go/metadata/optionaldata.go b/api/go/metadata/optionaldata.go index 62bd74d1..9396de92 100644 --- a/api/go/metadata/optionaldata.go +++ b/api/go/metadata/optionaldata.go @@ -10,7 +10,6 @@ package metadata import ( "encoding/binary" - "fmt" "io" "github.com/novus-engine/novuspack/api/go/pkgerrors" @@ -48,30 +47,14 @@ type OptionalDataEntry struct { // - Data must not be nil or empty // // Returns an error if any validation check fails. -func (o *OptionalDataEntry) Validate() error { - if len(o.Data) == 0 { - return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "optional data cannot be nil or empty", nil, pkgerrors.ValidationErrorContext{ - Field: "Data", - Value: nil, - Expected: "non-empty data", - }) - } - - if uint16(len(o.Data)) != o.DataLength { - return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "data length mismatch", nil, pkgerrors.ValidationErrorContext{ - Field: "DataLength", - Value: o.DataLength, - Expected: fmt.Sprintf("%d", len(o.Data)), - }) - } - - return nil +func (o *OptionalDataEntry) validate() error { + return validateSliceLength(len(o.Data), o.DataLength, "Data", "optional data cannot be nil or empty", "non-empty data") } // Size returns the total size of the OptionalDataEntry in bytes. // // Specification: package_file_format.md: 4.1.4.4 Optional Data -func (o *OptionalDataEntry) Size() int { +func (o OptionalDataEntry) size() int { return 3 + int(o.DataLength) // Type(1) + Length(2) + Data } @@ -85,7 +68,7 @@ func (o *OptionalDataEntry) Size() int { // Returns the number of bytes read and any error encountered. // // Specification: package_file_format.md: 4.1.4.4 Optional Data -func (o *OptionalDataEntry) ReadFrom(r io.Reader) (int64, error) { +func (o *OptionalDataEntry) readFrom(r io.Reader) (int64, error) { var totalRead int64 // Read DataType (1 byte) @@ -113,28 +96,12 @@ func (o *OptionalDataEntry) ReadFrom(r io.Reader) (int64, error) { o.DataLength = dataLength // Read Data (DataLength bytes) - if dataLength > 0 { - data := make([]byte, dataLength) - n, err := io.ReadFull(r, data) - if err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read data", pkgerrors.ValidationErrorContext{ - Field: "Data", - Value: dataLength, - Expected: "data bytes", - }) - } - if uint16(n) != dataLength { - return totalRead, pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "incomplete data read", nil, pkgerrors.ValidationErrorContext{ - Field: "Data", - Value: n, - Expected: fmt.Sprintf("%d bytes", dataLength), - }) - } - totalRead += int64(n) - o.Data = data - } else { - o.Data = nil + data, n, err := readLengthPrefixedBytes(r, dataLength, "Data", "data bytes") + if err != nil { + return totalRead, err } + totalRead += n + o.Data = data return totalRead, nil } @@ -149,7 +116,7 @@ func (o *OptionalDataEntry) ReadFrom(r io.Reader) (int64, error) { // Returns the number of bytes written and any error encountered. // // Specification: package_file_format.md: 4.1.4.4 Optional Data -func (o *OptionalDataEntry) WriteTo(w io.Writer) (int64, error) { +func (o *OptionalDataEntry) writeTo(w io.Writer) (int64, error) { var totalWritten int64 // Write DataType (1 byte) @@ -173,31 +140,11 @@ func (o *OptionalDataEntry) WriteTo(w io.Writer) (int64, error) { totalWritten += 2 // Write Data (DataLength bytes) - if o.DataLength > 0 { - if uint16(len(o.Data)) != o.DataLength { - return totalWritten, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "data length mismatch", nil, pkgerrors.ValidationErrorContext{ - Field: "DataLength", - Value: len(o.Data), - Expected: fmt.Sprintf("%d", o.DataLength), - }) - } - n, err := w.Write(o.Data) - if err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write data", pkgerrors.ValidationErrorContext{ - Field: "Data", - Value: o.Data, - Expected: "written successfully", - }) - } - if uint16(n) != o.DataLength { - return totalWritten, pkgerrors.NewPackageError(pkgerrors.ErrTypeIO, "incomplete data write", nil, pkgerrors.ValidationErrorContext{ - Field: "Data", - Value: n, - Expected: fmt.Sprintf("%d bytes", o.DataLength), - }) - } - totalWritten += int64(n) + n, err := writeLengthPrefixedBytes(w, o.Data, o.DataLength, "Data") + if err != nil { + return totalWritten, err } + totalWritten += n return totalWritten, nil } diff --git a/api/go/metadata/optionaldata_test.go b/api/go/metadata/optionaldata_test.go index 7f66b8a8..11fbcc32 100644 --- a/api/go/metadata/optionaldata_test.go +++ b/api/go/metadata/optionaldata_test.go @@ -2,7 +2,6 @@ package metadata import ( "bytes" - "io" "testing" "github.com/novus-engine/novuspack/api/go/internal/testhelpers" @@ -10,127 +9,35 @@ import ( // TestOptionalDataEntry_WriteTo tests the WriteTo method. func TestOptionalDataEntry_WriteTo(t *testing.T) { - tests := []struct { - name string - entry OptionalDataEntry - wantErr bool - }{ - { - name: "valid optional data entry", - entry: OptionalDataEntry{ - DataType: OptionalDataTagsData, - DataLength: 10, - Data: make([]byte, 10), - }, - wantErr: false, - }, - { - name: "empty data", - entry: OptionalDataEntry{ - DataType: OptionalDataTagsData, - DataLength: 0, - Data: []byte{}, - }, - wantErr: false, - }, - { - name: "data length mismatch", - entry: OptionalDataEntry{ - DataType: OptionalDataTagsData, - DataLength: 10, - Data: make([]byte, 5), // Mismatch - }, - wantErr: true, - }, - { - name: "incomplete write", - entry: OptionalDataEntry{ - DataType: OptionalDataTagsData, - DataLength: 10, - Data: make([]byte, 10), - }, - wantErr: false, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - var buf bytes.Buffer - n, err := tt.entry.WriteTo(&buf) - - if (err != nil) != tt.wantErr { - t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) - return - } - - if !tt.wantErr { - if n == 0 { - t.Error("WriteTo() wrote 0 bytes") - } - - // Verify minimum size: DataType (1) + DataLength (2) + Data - minSize := int64(3 + len(tt.entry.Data)) - if n < minSize { - t.Errorf("WriteTo() wrote %d bytes, want at least %d", n, minSize) - } - } - }) - } + runWriteToEntryTable(t, []writeToCase{ + {"valid optional data entry", &OptionalDataEntry{ + DataType: OptionalDataTagsData, DataLength: 10, Data: make([]byte, 10), + }, false, 13}, + {"empty data", &OptionalDataEntry{ + DataType: OptionalDataTagsData, DataLength: 0, Data: []byte{}, + }, false, 3}, + {"data length mismatch", &OptionalDataEntry{ + DataType: OptionalDataTagsData, DataLength: 10, Data: make([]byte, 5), + }, true, 0}, + {"incomplete write", &OptionalDataEntry{ + DataType: OptionalDataTagsData, DataLength: 10, Data: make([]byte, 10), + }, false, 13}, + }) } // TestOptionalDataEntry_WriteTo_ErrorPaths tests error paths in WriteTo method. func TestOptionalDataEntry_WriteTo_ErrorPaths(t *testing.T) { - entry := OptionalDataEntry{ + entry := &OptionalDataEntry{ DataType: OptionalDataTagsData, DataLength: 10, Data: make([]byte, 10), } - - tests := []struct { - name string - writer io.Writer - wantErr bool - }{ - { - name: "write error on DataType", - writer: testhelpers.NewErrorWriter(), - wantErr: true, - }, - { - name: "write error on DataLength", - writer: testhelpers.NewFailingWriter(1), // Fails after writing DataType - wantErr: true, - }, - { - name: "write error on Data", - writer: testhelpers.NewFailingWriter(3), // Fails after writing DataType and DataLength - wantErr: true, - }, - { - name: "incomplete write on Data", - writer: testhelpers.NewFailingWriter(5), // Fails after writing DataType, DataLength, and partial Data - wantErr: true, - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - _, err := entry.WriteTo(tt.writer) - - if (err != nil) != tt.wantErr { - t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) - return - } - - if tt.wantErr { - if err == nil { - t.Error("WriteTo() expected error but got nil") - } - // Note: errorWriter returns error immediately, so bytes written may be 0 - // failingWriter may write some bytes before failing - } - }) - } + runWriteToErrorPathsTable(t, entry, []writeToErrorCase{ + {"write error on DataType", testhelpers.NewErrorWriter(), true}, + {"write error on DataLength", testhelpers.NewFailingWriter(1), true}, + {"write error on Data", testhelpers.NewFailingWriter(3), true}, + {"incomplete write on Data", testhelpers.NewFailingWriter(5), true}, + }) } // TestOptionalDataEntry_ReadFrom tests the ReadFrom method. @@ -143,12 +50,12 @@ func TestOptionalDataEntry_ReadFrom(t *testing.T) { } var buf bytes.Buffer - if _, err := entry.WriteTo(&buf); err != nil { + if _, err := entry.writeTo(&buf); err != nil { t.Fatalf("Failed to write test data: %v", err) } var readEntry OptionalDataEntry - n, err := readEntry.ReadFrom(&buf) + n, err := readEntry.readFrom(&buf) if err != nil { t.Fatalf("ReadFrom() error = %v", err) @@ -166,44 +73,18 @@ func TestOptionalDataEntry_ReadFrom(t *testing.T) { t.Errorf("ReadFrom() DataLength = %v, want %v", readEntry.DataLength, entry.DataLength) } - if string(readEntry.Data) != string(entry.Data) { + if !bytes.Equal(readEntry.Data, entry.Data) { t.Errorf("ReadFrom() Data = %q, want %q", string(readEntry.Data), string(entry.Data)) } } // TestOptionalDataEntry_ReadFrom_IncompleteData tests error handling for incomplete data. func TestOptionalDataEntry_ReadFrom_IncompleteData(t *testing.T) { - tests := []struct { - name string - data []byte - }{ - { - name: "no data", - data: []byte{}, - }, - { - name: "incomplete DataLength", - data: []byte{0x00}, // Only 1 byte (need 2 for DataLength) - }, - { - name: "incomplete Data", - data: []byte{0x00, 0x0A, 0x00}, // DataType + DataLength (10) + only 1 byte of data (need 10) - }, - { - name: "partial Data read", - data: []byte{0x00, 0x0A, 0x00, 0x74, 0x65, 0x73, 0x74}, // DataType + DataLength (10) + only 4 bytes of data (need 10) - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - var entry OptionalDataEntry - r := bytes.NewReader(tt.data) - _, err := entry.ReadFrom(r) - - if err == nil { - t.Errorf("ReadFrom() expected error for incomplete data, got nil") - } - }) + tests := []readFromIncompleteCase{ + {"no data", []byte{}}, + {"incomplete DataLength", []byte{0x00}}, + {"incomplete Data", []byte{0x00, 0x0A, 0x00}}, + {"partial Data read", []byte{0x00, 0x0A, 0x00, 0x74, 0x65, 0x73, 0x74}}, } + runReadFromIncompleteTable(t, tests, func() readFromEntry { return &OptionalDataEntry{} }) } diff --git a/api/go/metadata/optionaldata_validate_test.go b/api/go/metadata/optionaldata_validate_test.go index a2a58993..48d35aaa 100644 --- a/api/go/metadata/optionaldata_validate_test.go +++ b/api/go/metadata/optionaldata_validate_test.go @@ -1,60 +1,55 @@ package metadata -import ( - "testing" -) +import "testing" + +type optionalDataValidatable struct { + entry OptionalDataEntry +} + +func (o optionalDataValidatable) Validate() error { + return o.entry.validate() +} // TestOptionalDataEntry_Validate tests the Validate method. func TestOptionalDataEntry_Validate(t *testing.T) { - tests := []struct { - name string - entry OptionalDataEntry - wantErr bool - }{ + tests := []validateCase{ { name: "valid optional data entry", - entry: OptionalDataEntry{ + subject: optionalDataValidatable{entry: OptionalDataEntry{ DataType: OptionalDataTagsData, DataLength: 10, Data: make([]byte, 10), - }, + }}, wantErr: false, }, { name: "empty data", - entry: OptionalDataEntry{ + subject: optionalDataValidatable{entry: OptionalDataEntry{ DataType: OptionalDataTagsData, DataLength: 0, Data: []byte{}, - }, + }}, wantErr: true, }, { name: "data length mismatch", - entry: OptionalDataEntry{ + subject: optionalDataValidatable{entry: OptionalDataEntry{ DataType: OptionalDataTagsData, DataLength: 10, Data: make([]byte, 5), - }, + }}, wantErr: true, }, { name: "nil data", - entry: OptionalDataEntry{ + subject: optionalDataValidatable{entry: OptionalDataEntry{ DataType: OptionalDataTagsData, DataLength: 10, Data: nil, - }, + }}, wantErr: true, }, } - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - err := tt.entry.Validate() - if (err != nil) != tt.wantErr { - t.Errorf("Validate() error = %v, wantErr %v", err, tt.wantErr) - } - }) - } + runValidateTable(t, tests) } diff --git a/api/go/metadata/package_info_test.go b/api/go/metadata/package_info_test.go index a9a19b2d..cc3e20b5 100644 --- a/api/go/metadata/package_info_test.go +++ b/api/go/metadata/package_info_test.go @@ -11,6 +11,8 @@ import ( ) // TestPackageInfo tests the PackageInfo struct. +// +//nolint:gocognit,gocyclo // table-driven info cases func TestPackageInfo(t *testing.T) { now := time.Now() signatureInfo := signatures.SignatureInfo{ @@ -117,6 +119,8 @@ func TestPackageInfo(t *testing.T) { } // TestNewPackageInfo tests the NewPackageInfo function. +// +//nolint:gocognit,gocyclo // table-driven new-info cases func TestNewPackageInfo(t *testing.T) { info := NewPackageInfo() @@ -219,6 +223,8 @@ func TestNewPackageInfo(t *testing.T) { } // TestPackageInfo_ZeroValue tests the zero value of PackageInfo. +// +//nolint:gocognit,gocyclo // table-driven zero-value cases func TestPackageInfo_ZeroValue(t *testing.T) { info := PackageInfo{} diff --git a/api/go/metadata/package_metadata.go b/api/go/metadata/package_metadata.go index 3a8a35f0..858fd2b3 100644 --- a/api/go/metadata/package_metadata.go +++ b/api/go/metadata/package_metadata.go @@ -1,11 +1,11 @@ // This file implements the PackageMetadata structure providing comprehensive // package metadata including all package information plus detailed file and // metadata file contents. It contains the PackageMetadata type definition -// and NewPackageMetadata constructor as specified in api_core.md Section 1.1.6 -// and api_metadata.md Section 7.5. +// and NewPackageMetadata constructor as specified in api_core.md Section 1.2.6 +// and api_metadata.md Section 7.1. // -// Specification: api_core.md: 1.1.6 GetMetadata Method Contract -// Specification: api_metadata.md: 1. Comment Management +// Specification: api_core.md: 1.2.6 Package.GetMetadata Method +// Specification: api_metadata.md: 7.1 PackageInfo Structure // Package metadata provides metadata domain structures for the NovusPack implementation. // @@ -23,8 +23,8 @@ package metadata // This method MUST NOT perform additional disk I/O or parsing beyond what OpenPackage // already loaded. All data in PackageMetadata comes from already-loaded package state. // -// Specification: api_core.md: 1.1.6.5 PackageMetadata Contents -// Specification: api_metadata.md: 1. Comment Management +// Specification: api_core.md: 1.2.6.2 Package.GetMetadata Returns +// Specification: api_metadata.md: 7.1 PackageInfo Structure type PackageMetadata struct { // Embed PackageInfo for basic package information *PackageInfo @@ -49,7 +49,7 @@ type PackageMetadata struct { // Returns: // - *PackageMetadata: A new PackageMetadata instance with default values // -// Specification: api_metadata.md: 1. Comment Management +// Specification: api_metadata.md: 7.1 PackageInfo Structure func NewPackageMetadata() *PackageMetadata { return &PackageMetadata{ PackageInfo: NewPackageInfo(), diff --git a/api/go/metadata/package_metadata_test.go b/api/go/metadata/package_metadata_test.go index 8a145e74..eb71c63a 100644 --- a/api/go/metadata/package_metadata_test.go +++ b/api/go/metadata/package_metadata_test.go @@ -45,35 +45,52 @@ func TestPackageMetadata_ContainsPackageInfo(t *testing.T) { } } -// TestPackageMetadata_ContainsFileEntries tests that PackageMetadata contains FileEntries slice. -func TestPackageMetadata_ContainsFileEntries(t *testing.T) { +func runContainsSliceTest(t *testing.T, fieldName string, checkNil func(*PackageMetadata) bool, getLen func(*PackageMetadata) int, appendNil func(*PackageMetadata)) { + t.Helper() pm := NewPackageMetadata() - if pm.FileEntries == nil { - t.Fatal("FileEntries is nil") + if checkNil(pm) { + t.Fatalf("%s is nil", fieldName) } - if len(pm.FileEntries) != 0 { - t.Errorf("FileEntries length = %v, want 0", len(pm.FileEntries)) + if getLen(pm) != 0 { + t.Errorf("%s length = %v, want 0", fieldName, getLen(pm)) } - // Test that we can append to FileEntries - pm.FileEntries = append(pm.FileEntries, nil) - if len(pm.FileEntries) != 1 { - t.Errorf("FileEntries length after append = %v, want 1", len(pm.FileEntries)) + appendNil(pm) + if getLen(pm) != 1 { + t.Errorf("%s length after append = %v, want 1", fieldName, getLen(pm)) } } -// TestPackageMetadata_ContainsPathMetadataEntries tests that PackageMetadata contains PathMetadataEntries slice. -func TestPackageMetadata_ContainsPathMetadataEntries(t *testing.T) { - pm := NewPackageMetadata() - if pm.PathMetadataEntries == nil { - t.Fatal("PathMetadataEntries is nil") - } - if len(pm.PathMetadataEntries) != 0 { - t.Errorf("PathMetadataEntries length = %v, want 0", len(pm.PathMetadataEntries)) - } - // Test that we can append to PathMetadataEntries - pm.PathMetadataEntries = append(pm.PathMetadataEntries, nil) - if len(pm.PathMetadataEntries) != 1 { - t.Errorf("PathMetadataEntries length after append = %v, want 1", len(pm.PathMetadataEntries)) +type containsSliceCase struct { + name string + fieldName string + checkNil func(*PackageMetadata) bool + getLen func(*PackageMetadata) int + appendNil func(*PackageMetadata) +} + +func makeContainsSliceCase(name, fieldName string, checkNil func(*PackageMetadata) bool, getLen func(*PackageMetadata) int, appendNil func(*PackageMetadata)) containsSliceCase { + return containsSliceCase{name, fieldName, checkNil, getLen, appendNil} +} + +//nolint:dupl // two slice fields require separate case builders with different accessors +func containsSliceCasesForTest() []containsSliceCase { + fileEntriesCase := makeContainsSliceCase("FileEntries", "FileEntries", + func(pm *PackageMetadata) bool { return pm.FileEntries == nil }, + func(pm *PackageMetadata) int { return len(pm.FileEntries) }, + func(pm *PackageMetadata) { pm.FileEntries = append(pm.FileEntries, nil) }) + pathMetaCase := makeContainsSliceCase("PathMetadataEntries", "PathMetadataEntries", + func(pm *PackageMetadata) bool { return pm.PathMetadataEntries == nil }, + func(pm *PackageMetadata) int { return len(pm.PathMetadataEntries) }, + func(pm *PackageMetadata) { pm.PathMetadataEntries = append(pm.PathMetadataEntries, nil) }) + return []containsSliceCase{fileEntriesCase, pathMetaCase} +} + +// TestPackageMetadata_ContainsSlices tests that PackageMetadata contains FileEntries and PathMetadataEntries slices. +func TestPackageMetadata_ContainsSlices(t *testing.T) { + for _, tt := range containsSliceCasesForTest() { + t.Run(tt.name, func(t *testing.T) { + runContainsSliceTest(t, tt.fieldName, tt.checkNil, tt.getLen, tt.appendNil) + }) } } diff --git a/api/go/metadata/path_metadata_entry_methods.go b/api/go/metadata/path_metadata_entry_methods.go index 0c87fa7e..ca424356 100644 --- a/api/go/metadata/path_metadata_entry_methods.go +++ b/api/go/metadata/path_metadata_entry_methods.go @@ -235,6 +235,8 @@ func (pme *PathMetadataEntry) GetAncestors() []*PathMetadataEntry { // - error: *PackageError on failure // // Specification: api_metadata.md: 8.1.2 PathMetadataEntry Structure +// +//nolint:gocognit // hierarchy walk and tag merge branches func (pme *PathMetadataEntry) GetInheritedTags() ([]*generics.Tag[any], error) { if pme.ParentPath == nil { return []*generics.Tag[any]{}, nil @@ -318,6 +320,8 @@ func (pme *PathMetadataEntry) GetInheritedTags() ([]*generics.Tag[any], error) { // - error: *PackageError on failure // // Specification: api_metadata.md: 8.1.2 PathMetadataEntry Structure +// +//nolint:gocognit // inheritance and merge logic branches func (pme *PathMetadataEntry) GetEffectiveTags() ([]*generics.Tag[any], error) { tagMap := make(map[string]*generics.Tag[any]) diff --git a/api/go/metadata/path_metadata_entry_tags.go b/api/go/metadata/path_metadata_entry_tags.go index 95becb4c..ad874324 100644 --- a/api/go/metadata/path_metadata_entry_tags.go +++ b/api/go/metadata/path_metadata_entry_tags.go @@ -60,16 +60,7 @@ func GetPathMetaTagsByType[T any](pme *PathMetadataEntry) ([]*generics.Tag[T], e if err != nil { return nil, err } - - result := make([]*generics.Tag[T], 0) - for i := range allTags { - // Type assert the value to ensure it's of type T - if typedValue, ok := allTags[i].Value.(T); ok { - result = append(result, generics.NewTag(allTags[i].Key, typedValue, allTags[i].Type)) - } - } - - return result, nil + return filterTagsByType[T](allTags), nil } // GetPathMetaTag retrieves a type-safe tag by key from a PathMetadataEntry. diff --git a/api/go/metadata/path_metadata_entry_test.go b/api/go/metadata/path_metadata_entry_test.go index 44b8cec1..9cc0fdb9 100644 --- a/api/go/metadata/path_metadata_entry_test.go +++ b/api/go/metadata/path_metadata_entry_test.go @@ -458,38 +458,7 @@ func TestPathMetadataEntry_ParentPath(t *testing.T) { // TestPathMetadataEntry_GetInheritedTags tests the GetInheritedTags method. func TestPathMetadataEntry_GetInheritedTags(t *testing.T) { - // Create hierarchy: root -> parent -> child - root := &PathMetadataEntry{ - Path: generics.PathEntry{PathLength: 1, Path: "/"}, - Type: PathMetadataTypeDirectory, - Inheritance: &PathInheritance{ - Enabled: true, - Priority: 1, - }, - Properties: []*generics.Tag[any]{ - { - Key: "root-tag", - Value: "root-value", - Type: generics.TagValueTypeString, - }, - }, - } - - parent := &PathMetadataEntry{ - Path: generics.PathEntry{PathLength: 4, Path: "dir"}, - Type: PathMetadataTypeDirectory, - Inheritance: &PathInheritance{ - Enabled: true, - Priority: 2, - }, - Properties: []*generics.Tag[any]{ - { - Key: "parent-tag", - Value: "parent-value", - Type: generics.TagValueTypeString, - }, - }, - } + root, parent := pathMetadataRootParentFixture() parent.SetParentPath(root) child := &PathMetadataEntry{ diff --git a/api/go/metadata/path_metadata_inheritance_fixture_test.go b/api/go/metadata/path_metadata_inheritance_fixture_test.go new file mode 100644 index 00000000..3fcdb210 --- /dev/null +++ b/api/go/metadata/path_metadata_inheritance_fixture_test.go @@ -0,0 +1,22 @@ +// Shared root/parent PathMetadataEntry fixture for inheritance tests. + +package metadata + +import "github.com/novus-engine/novuspack/api/go/generics" + +func pathMetadataInheritanceEntry(pathLen int, pathStr, tagKey, tagValue string, priority int) *PathMetadataEntry { + return &PathMetadataEntry{ + Path: generics.PathEntry{PathLength: uint16(pathLen), Path: pathStr}, + Type: PathMetadataTypeDirectory, + Inheritance: &PathInheritance{Enabled: true, Priority: priority}, + Properties: []*generics.Tag[any]{{Key: tagKey, Value: tagValue, Type: generics.TagValueTypeString}}, + } +} + +// pathMetadataRootParentFixture returns root and parent entries for inheritance tests. +// Caller should call parent.SetParentPath(root) to link them. +func pathMetadataRootParentFixture() (root, parent *PathMetadataEntry) { + root = pathMetadataInheritanceEntry(1, "/", "root-tag", "root-value", 1) + parent = pathMetadataInheritanceEntry(4, "dir", "parent-tag", "parent-value", 2) + return root, parent +} diff --git a/api/go/metadata/slice_validation.go b/api/go/metadata/slice_validation.go new file mode 100644 index 00000000..7e2bcb70 --- /dev/null +++ b/api/go/metadata/slice_validation.go @@ -0,0 +1,31 @@ +// This file provides shared validation for slice length used by HashEntry and +// OptionalDataEntry to avoid duplicate validation logic. +// +// Specification: package_file_format.md: 4.1.4.3 Hash Data + +package metadata + +import ( + "fmt" + + "github.com/novus-engine/novuspack/api/go/pkgerrors" +) + +// validateSliceLength ensures the slice is non-empty and lengthField matches slice length. +func validateSliceLength(sliceLen int, lengthField uint16, fieldName, emptyErrMsg, emptyExpected string) error { + if sliceLen == 0 { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, emptyErrMsg, nil, pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: nil, + Expected: emptyExpected, + }) + } + if uint16(sliceLen) != lengthField { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "length mismatch", nil, pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: lengthField, + Expected: fmt.Sprintf("%d", sliceLen), + }) + } + return nil +} diff --git a/api/go/metadata/tags_filter.go b/api/go/metadata/tags_filter.go new file mode 100644 index 00000000..1577c2a8 --- /dev/null +++ b/api/go/metadata/tags_filter.go @@ -0,0 +1,19 @@ +// This file provides shared tag filtering logic used by FileEntry and +// PathMetadataEntry tag operations to avoid duplicate type-filter loops. +// +// Specification: api_file_mgmt_file_entry.md: 3.3.2 Getting Tags by Type + +package metadata + +import "github.com/novus-engine/novuspack/api/go/generics" + +// filterTagsByType returns tags from allTags whose Value is assignable to type T. +func filterTagsByType[T any](allTags []*generics.Tag[any]) []*generics.Tag[T] { + result := make([]*generics.Tag[T], 0) + for i := range allTags { + if typedValue, ok := allTags[i].Value.(T); ok { + result = append(result, generics.NewTag(allTags[i].Key, typedValue, allTags[i].Type)) + } + } + return result +} diff --git a/api/go/metadata/validate_table_test.go b/api/go/metadata/validate_table_test.go new file mode 100644 index 00000000..f0d089d0 --- /dev/null +++ b/api/go/metadata/validate_table_test.go @@ -0,0 +1,27 @@ +// Shared table runner for Validate() tests across metadata types. + +package metadata + +import "testing" + +type validatable interface { + Validate() error +} + +type validateCase struct { + name string + subject validatable + wantErr bool +} + +func runValidateTable(t *testing.T, tests []validateCase) { + t.Helper() + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + err := tt.subject.Validate() + if (err != nil) != tt.wantErr { + t.Errorf("Validate() error = %v, wantErr %v", err, tt.wantErr) + } + }) + } +} diff --git a/api/go/novus_package/package.go b/api/go/novus_package/package.go index d82c1ce1..22e6c0f8 100644 --- a/api/go/novus_package/package.go +++ b/api/go/novus_package/package.go @@ -1,6 +1,6 @@ // This file defines the core Package interface and filePackage implementation. -// It contains Package, PackageReader, and PackageWriter interfaces as specified -// in api_core.md, along with the filePackage struct that implements these interfaces. +// It contains the Package interface as specified in api_core.md, along with the +// filePackage struct that implements it. // This file should contain the main Package type definition, interface declarations, // and the NewPackage constructor function. // @@ -40,46 +40,29 @@ import ( ) // ============================================================================= -// INTERFACES +// INTERFACE // ============================================================================= -// PackageReader defines the interface for reading operations on a package. +// Package defines the main interface for NovusPack package operations. // -// PackageReader provides methods for reading files, listing files, retrieving -// metadata, and validating package contents. +// Package provides the unified v1 API surface for read operations, write +// operations, lifecycle management, file management, and metadata handling. // -// Specification: api_core.md: 1.1 PackageReader Interface -type PackageReader interface { +// Specification: api_core.md: 1.1 Package Interface +type Package interface { + // Read operations ReadFile(ctx context.Context, path string) ([]byte, error) ListFiles() ([]FileInfo, error) GetMetadata() (*metadata.PackageMetadata, error) Validate(ctx context.Context) error GetInfo() (*metadata.PackageInfo, error) -} -// PackageWriter defines the interface for writing operations on a package. -// -// PackageWriter provides methods for adding files, removing files, and writing -// the package to disk. Compression and signing options are configured via package -// state rather than method parameters. -// -// Specification: api_core.md: 1.2 PackageWriter Interface -type PackageWriter interface { + // Write operations Write(ctx context.Context) error SafeWrite(ctx context.Context, overwrite bool) error FastWrite(ctx context.Context) error -} -// Package defines the main interface for NovusPack package operations. -// -// Package combines PackageReader and PackageWriter interfaces, providing -// complete package lifecycle management including opening, closing, and -// defragmentation operations. -// -// Specification: api_core.md: 1.3 Package Interface -type Package interface { - PackageReader - PackageWriter + // Lifecycle operations Create(ctx context.Context, path string) error CreateWithOptions(ctx context.Context, path string, options *CreateOptions) error Close() error @@ -90,11 +73,11 @@ type Package interface { Defragment(ctx context.Context) error // Target path management - // Specification: api_core.md: 1.2 PackageWriter Interface + // Specification: api_basic_operations.md: 8. Package.SetTargetPath Method SetTargetPath(ctx context.Context, path string) error // Session base management - // Specification: api_basic_operations.md: 3.1 Package Implementation Structure + // Specification: api_basic_operations.md: 19. Package Session Base Management SetSessionBase(basePath string) error GetSessionBase() string ClearSessionBase() @@ -110,8 +93,8 @@ type Package interface { // File removal operations // Specification: api_file_mgmt_removal.md: 2. RemoveFile Package Method RemoveFile(ctx context.Context, path string) error - RemoveFilePattern(ctx context.Context, pattern string) error - RemoveDirectory(ctx context.Context, dirPath string) error + RemoveFilePattern(ctx context.Context, pattern string) ([]string, error) + RemoveDirectory(ctx context.Context, dirPath string, options *RemoveDirectoryOptions) ([]string, error) // Comment management operations // Specification: api_metadata.md: 1. Comment Management @@ -142,7 +125,7 @@ type Package interface { // filePackage is the concrete implementation of the Package interface. // // filePackage provides the main implementation for interacting with NovusPack files. -// It implements Package, PackageReader, and PackageWriter interfaces. +// It implements the Package interface. // // Lifecycle States: // - New: Created via NewPackage(), not yet associated with a file @@ -216,7 +199,7 @@ type filePackage struct { // } // defer pkg.Close() // -// Specification: api_basic_operations.md: 6.1 Package Constructor +// Specification: api_basic_operations.md: 6.1 NewPackage Behavior func NewPackage() (Package, error) { // Initialize package with default values pkg := &filePackage{ diff --git a/api/go/novus_package/package_builder.go b/api/go/novus_package/package_builder.go index ece1f040..b806d0e5 100644 --- a/api/go/novus_package/package_builder.go +++ b/api/go/novus_package/package_builder.go @@ -36,13 +36,6 @@ type PackageBuilder interface { Build(ctx context.Context) (Package, error) } -// EncryptionType represents the type of encryption to use. -// -// This type is used by PackageBuilder interface. -// Use constants from fileformat package (EncryptionNone, EncryptionAES256GCM, etc.) -// via the novuspack package re-exports. -type EncryptionType uint8 - // packageBuilder is the concrete implementation of PackageBuilder. type packageBuilder struct { compression CompressionType diff --git a/api/go/novus_package/package_builder_test.go b/api/go/novus_package/package_builder_test.go index b162efb8..9d370b52 100644 --- a/api/go/novus_package/package_builder_test.go +++ b/api/go/novus_package/package_builder_test.go @@ -43,8 +43,8 @@ func TestPackageBuilder_WithEncryption(t *testing.T) { // TestPackageBuilder_WithMetadata tests the WithMetadata method. func TestPackageBuilder_WithMetadata(t *testing.T) { builder := NewBuilder() - metadata := map[string]string{"key": "value"} - result := builder.WithMetadata(metadata) + md := map[string]string{"key": "value"} + result := builder.WithMetadata(md) if result != builder { t.Errorf("WithMetadata() should return the same builder instance") } @@ -192,14 +192,9 @@ func TestPackageBuilder_Build_SetCommentError(t *testing.T) { } } -// TestPackageBuilder_Build_SetVendorIDError tests Build when SetVendorID fails. -func TestPackageBuilder_Build_SetVendorIDError(t *testing.T) { +func runBuildAndAssertID[T comparable](t *testing.T, builder PackageBuilder, getter func(Package) T, want T, fieldName string) { + t.Helper() ctx := context.Background() - builder := NewBuilder().WithVendorID(12345) - - // We need to somehow make SetVendorID fail, but it only fails if Info is nil - // Since NewPackage always creates Info, we can't easily trigger this error - // This test documents the limitation pkg, err := builder.Build(ctx) if err != nil { t.Fatalf("Build() failed: %v", err) @@ -207,26 +202,17 @@ func TestPackageBuilder_Build_SetVendorIDError(t *testing.T) { if pkg == nil { t.Fatal("Build() returned nil package") } - if pkg.GetVendorID() != 12345 { - t.Errorf("VendorID = %d, want 12345", pkg.GetVendorID()) + if getter(pkg) != want { + t.Errorf("%s = %v, want %v", fieldName, getter(pkg), want) } } +// TestPackageBuilder_Build_SetVendorIDError tests Build when SetVendorID fails. +func TestPackageBuilder_Build_SetVendorIDError(t *testing.T) { + runBuildAndAssertID(t, NewBuilder().WithVendorID(12345), func(p Package) uint32 { return p.GetVendorID() }, uint32(12345), "VendorID") +} + // TestPackageBuilder_Build_SetAppIDError tests Build when SetAppID fails. func TestPackageBuilder_Build_SetAppIDError(t *testing.T) { - ctx := context.Background() - builder := NewBuilder().WithAppID(67890) - - // Similar to SetVendorID, SetAppID only fails if Info is nil - // Since NewPackage always creates Info, we can't easily trigger this error - pkg, err := builder.Build(ctx) - if err != nil { - t.Fatalf("Build() failed: %v", err) - } - if pkg == nil { - t.Fatal("Build() returned nil package") - } - if pkg.GetAppID() != 67890 { - t.Errorf("AppID = %d, want 67890", pkg.GetAppID()) - } + runBuildAndAssertID(t, NewBuilder().WithAppID(67890), func(p Package) uint64 { return p.GetAppID() }, uint64(67890), "AppID") } diff --git a/api/go/novus_package/package_comment.go b/api/go/novus_package/package_comment.go index 65b7703c..7c4fbab1 100644 --- a/api/go/novus_package/package_comment.go +++ b/api/go/novus_package/package_comment.go @@ -8,8 +8,12 @@ package novus_package import ( + "fmt" + "unicode/utf8" + "github.com/novus-engine/novuspack/api/go/fileformat" "github.com/novus-engine/novuspack/api/go/metadata" + "github.com/novus-engine/novuspack/api/go/pkgerrors" ) // SetComment sets or updates the package comment. @@ -27,16 +31,8 @@ import ( // Specification: api_metadata.md: 1. Comment Management func (p *filePackage) SetComment(comment string) error { - // Create PackageComment instance - pc := metadata.NewPackageComment() - - // Set comment using PackageComment.SetComment which handles validation - if err := pc.SetComment(comment); err != nil { - return err - } - - // Validate the comment - if err := pc.Validate(); err != nil { + pc, err := buildPackageComment(comment) + if err != nil { return err } @@ -48,16 +44,16 @@ func (p *filePackage) SetComment(comment string) error { // Update header flags (bit 4 = FlagHasPackageComment) if pc.CommentLength > 0 { - p.header.SetFeature(fileformat.FlagHasPackageComment) + p.header.Flags |= fileformat.FlagHasPackageComment } else { - p.header.ClearFeature(fileformat.FlagHasPackageComment) + p.header.Flags &^= fileformat.FlagHasPackageComment } // Update PackageInfo // HasComment should be true only if there's actual comment text (not just null terminator) - commentText := pc.GetComment() + commentText := extractCommentText(pc) if p.Info != nil { - p.Info.HasComment = len(commentText) > 0 + p.Info.HasComment = commentText != "" p.Info.Comment = commentText // Increment MetadataVersion in PackageInfo (metadata changed) p.Info.MetadataVersion++ @@ -98,7 +94,7 @@ func (p *filePackage) ClearComment() error { p.header.CommentStart = 0 // Clear header flags (bit 4 = FlagHasPackageComment) - p.header.ClearFeature(fileformat.FlagHasPackageComment) + p.header.Flags &^= fileformat.FlagHasPackageComment // Update PackageInfo if p.Info != nil { @@ -126,3 +122,64 @@ func (p *filePackage) HasComment() bool { } return p.Info.HasComment } + +func buildPackageComment(comment string) (*metadata.PackageComment, error) { + pc := metadata.NewPackageComment() + if comment != "" && !utf8.ValidString(comment) { + return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "comment is not valid UTF-8", nil, pkgerrors.ValidationErrorContext{ + Field: "Comment", + Value: comment, + Expected: "valid UTF-8 string", + }) + } + + commentBytes := []byte(comment) + if len(commentBytes) > 0 && commentBytes[len(commentBytes)-1] == 0x00 { + commentBytes = commentBytes[:len(commentBytes)-1] + } + for i, b := range commentBytes { + if b == 0x00 { + return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, fmt.Sprintf("comment contains embedded null character at position %d", i), nil, pkgerrors.ValidationErrorContext{ + Field: "Comment", + Value: i, + Expected: "no embedded null characters", + }) + } + } + + length := uint32(len(commentBytes) + 1) + if length > metadata.MaxCommentLength { + return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "comment length exceeds maximum", nil, pkgerrors.ValidationErrorContext{ + Field: "CommentLength", + Value: length, + Expected: fmt.Sprintf("<= %d", metadata.MaxCommentLength), + }) + } + + if len(commentBytes) > 0 { + pc.Comment = string(commentBytes) + "\x00" + } else { + pc.Comment = "\x00" + } + pc.CommentLength = length + pc.Reserved = [3]uint8{0, 0, 0} + + if err := pc.Validate(); err != nil { + return nil, err + } + + return pc, nil +} + +func extractCommentText(pc *metadata.PackageComment) string { + if pc == nil || pc.CommentLength == 0 || pc.Comment == "" { + return "" + } + + commentBytes := []byte(pc.Comment) + if len(commentBytes) > 0 && commentBytes[len(commentBytes)-1] == 0x00 { + return string(commentBytes[:len(commentBytes)-1]) + } + + return pc.Comment +} diff --git a/api/go/novus_package/package_comment_test.go b/api/go/novus_package/package_comment_test.go index 73db1cce..5b11715f 100644 --- a/api/go/novus_package/package_comment_test.go +++ b/api/go/novus_package/package_comment_test.go @@ -18,6 +18,8 @@ import ( // TEST: SetComment // ============================================================================= +const testCommentStr = "Test comment" + // TestPackage_SetComment_Basic tests basic SetComment operation. func TestPackage_SetComment_Basic(t *testing.T) { pkg, err := NewPackage() @@ -79,14 +81,14 @@ func TestPackage_SetComment_UpdatesHeaderFlags(t *testing.T) { fpkg := pkg.(*filePackage) // Set comment - comment := "Test comment" + comment := testCommentStr err = fpkg.SetComment(comment) if err != nil { t.Fatalf("SetComment() failed: %v", err) } // Verify header flag is set - if !fpkg.header.HasFeature(fileformat.FlagHasPackageComment) { + if (fpkg.header.Flags & fileformat.FlagHasPackageComment) == 0 { t.Error("Header flag FlagHasPackageComment should be set after SetComment") } @@ -172,7 +174,7 @@ func TestPackage_GetComment_Basic(t *testing.T) { } // Set comment and retrieve - testComment := "Test comment" + testComment := testCommentStr err = fpkg.SetComment(testComment) if err != nil { t.Fatalf("SetComment() failed: %v", err) @@ -199,7 +201,7 @@ func TestPackage_ClearComment_Basic(t *testing.T) { fpkg := pkg.(*filePackage) // Set comment first - err = fpkg.SetComment("Test comment") + err = fpkg.SetComment(testCommentStr) if err != nil { t.Fatalf("SetComment() failed: %v", err) } @@ -220,7 +222,7 @@ func TestPackage_ClearComment_Basic(t *testing.T) { } // Verify header flag is cleared - if fpkg.header.HasFeature(fileformat.FlagHasPackageComment) { + if (fpkg.header.Flags & fileformat.FlagHasPackageComment) != 0 { t.Error("Header flag FlagHasPackageComment should be cleared after ClearComment") } @@ -230,38 +232,29 @@ func TestPackage_ClearComment_Basic(t *testing.T) { } } -// TestPackage_ClearComment_NoComment tests clearing comment when no comment exists. -func TestPackage_ClearComment_NoComment(t *testing.T) { +func runClearCommentSucceeds(t *testing.T, errMsg string) { + t.Helper() pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } defer func() { _ = pkg.Close() }() - fpkg := pkg.(*filePackage) - - // ClearComment should succeed even when no comment exists err = fpkg.ClearComment() if err != nil { - t.Errorf("ClearComment() should succeed when no comment exists, got error: %v", err) + t.Errorf("ClearComment() %s: %v", errMsg, err) } } +// TestPackage_ClearComment_NoComment tests clearing comment when no comment exists. +func TestPackage_ClearComment_NoComment(t *testing.T) { + runClearCommentSucceeds(t, "should succeed when no comment exists") +} + // TestPackage_ClearComment_WithContext tests ClearComment (no longer applicable since ClearComment doesn't take context). // This test is kept for reference but ClearComment is now a pure in-memory operation per spec. func TestPackage_ClearComment_WithContext(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // ClearComment is a pure in-memory operation and doesn't take context per spec - err = fpkg.ClearComment() - if err != nil { - t.Errorf("ClearComment() failed: %v", err) - } + runClearCommentSucceeds(t, "failed") } // ============================================================================= @@ -284,7 +277,7 @@ func TestPackage_HasComment_Basic(t *testing.T) { } // Set comment - err = fpkg.SetComment("Test comment") + err = fpkg.SetComment(testCommentStr) if err != nil { t.Fatalf("SetComment() failed: %v", err) } @@ -321,7 +314,7 @@ func TestPackage_SetComment_WithNilInfo(t *testing.T) { fpkg.Info = nil // SetComment should handle nil Info gracefully - comment := "Test comment" + comment := testCommentStr err = fpkg.SetComment(comment) if err != nil { t.Errorf("SetComment() should handle nil Info, got error: %v", err) @@ -331,28 +324,25 @@ func TestPackage_SetComment_WithNilInfo(t *testing.T) { fpkg.Info = originalInfo } -// TestPackage_GetComment_WithNilInfo tests GetComment when Info is nil. -func TestPackage_GetComment_WithNilInfo(t *testing.T) { +func runGetCommentWithNilInfo(t *testing.T) { + t.Helper() pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } defer func() { _ = pkg.Close() }() - fpkg := pkg.(*filePackage) - - // Temporarily set Info to nil originalInfo := fpkg.Info fpkg.Info = nil - - // GetComment should return empty string when Info is nil - comment := fpkg.GetComment() - if comment != "" { + defer func() { fpkg.Info = originalInfo }() + if comment := fpkg.GetComment(); comment != "" { t.Errorf("GetComment() = %q, want empty string when Info is nil", comment) } +} - // Restore Info - fpkg.Info = originalInfo +// TestPackage_GetComment_WithNilInfo tests GetComment when Info is nil. +func TestPackage_GetComment_WithNilInfo(t *testing.T) { + runGetCommentWithNilInfo(t) } // TestPackage_HasComment_WithNilInfo tests HasComment when Info is nil. @@ -389,7 +379,7 @@ func TestPackage_ClearComment_WithNilInfo(t *testing.T) { fpkg := pkg.(*filePackage) // Set comment first - err = fpkg.SetComment("Test comment") + err = fpkg.SetComment(testCommentStr) if err != nil { t.Fatalf("SetComment() failed: %v", err) } diff --git a/api/go/novus_package/package_file_lookup.go b/api/go/novus_package/package_file_lookup.go index a95342b7..6a54f84d 100644 --- a/api/go/novus_package/package_file_lookup.go +++ b/api/go/novus_package/package_file_lookup.go @@ -184,7 +184,7 @@ func (p *filePackage) FindEntriesByTag(tagKey string, tagValue any) ([]*metadata // Parse tags from the data (JSON format) // For now, we'll do a simple string match in the JSON data // A full implementation would deserialize the JSON and compare properly - searchStr := fmt.Sprintf("\"%s\"", tagKey) + searchStr := fmt.Sprintf("%q", tagKey) if bytes.Contains(optData.Data, []byte(searchStr)) { matches = append(matches, entry) break @@ -239,7 +239,7 @@ func (p *filePackage) FindEntriesByType(fileType uint16) ([]*metadata.FileEntry, // - int: Number of regular content files // - error: *PackageError on failure // -// Specification: api_file_mgmt_queries.md: 3.2.1 FindEntriesByType Package Method +// Specification: api_file_mgmt_queries.md: 3.2.1 Package.FindEntriesByType Method func (p *filePackage) GetFileCount() (int, error) { // This is a pure in-memory operation if p.FileEntries == nil { diff --git a/api/go/novus_package/package_file_management.go b/api/go/novus_package/package_file_management.go index 1bc87209..62ce348e 100644 --- a/api/go/novus_package/package_file_management.go +++ b/api/go/novus_package/package_file_management.go @@ -33,7 +33,9 @@ import ( // - *metadata.FileEntry: The created file entry with complete metadata // - error: *PackageError on failure // -// Specification: api_file_mgmt_addition.md: 2.1 AddFile Package Method +// Specification: api_file_mgmt_addition.md: 2.1 Package.AddFile Method +// +//nolint:gocognit,gocyclo // validation and path-determination branches func (p *filePackage) AddFile(ctx context.Context, path string, options *AddFileOptions) (*metadata.FileEntry, error) { // Validate context if err := internal.CheckContext(ctx, "AddFile"); err != nil { @@ -266,7 +268,7 @@ func (p *filePackage) AddFile(ctx context.Context, path string, options *AddFile } else { // Add new path to existing entry (multi-path/alias) targetEntry = entry - entry.Paths = append(entry.Paths, generics.PathEntry{Path: storedPath}) + entry.Paths = append(entry.Paths, generics.PathEntry{PathLength: uint16(len(storedPath)), Path: storedPath}) entry.PathCount++ entry.MetadataVersion++ } @@ -301,7 +303,7 @@ func (p *filePackage) AddFile(ctx context.Context, path string, options *AddFile targetEntry = metadata.NewFileEntry() targetEntry.FileID = newFileID targetEntry.Type = 0 // TODO: Determine file type from extension/content - targetEntry.Paths = []generics.PathEntry{{Path: storedPath}} + targetEntry.Paths = []generics.PathEntry{{PathLength: uint16(len(storedPath)), Path: storedPath}} targetEntry.PathCount = 1 targetEntry.OriginalSize = originalSize targetEntry.RawChecksum = rawChecksum // May be 0 if no deduplication check @@ -402,7 +404,9 @@ func (p *filePackage) AddFile(ctx context.Context, path string, options *AddFile // - *metadata.FileEntry: The created file entry // - error: *PackageError on failure // -// Specification: api_file_mgmt_addition.md: 2.2 AddFileFromMemory Package Method +// Specification: api_file_mgmt_addition.md: 2.2 Package.AddFileFromMemory Method +// +//nolint:gocognit,gocyclo // validation and path branches func (p *filePackage) AddFileFromMemory(ctx context.Context, path string, data []byte, options *AddFileOptions) (*metadata.FileEntry, error) { // Validate context if err := internal.CheckContext(ctx, "AddFileFromMemory"); err != nil { @@ -483,7 +487,7 @@ func (p *filePackage) AddFileFromMemory(ctx context.Context, path string, data [ } else { // Add path to existing entry (multi-path support) targetEntry = entry - entry.Paths = append(entry.Paths, generics.PathEntry{Path: normalizedPath}) + entry.Paths = append(entry.Paths, generics.PathEntry{PathLength: uint16(len(normalizedPath)), Path: normalizedPath}) entry.PathCount++ } break @@ -500,7 +504,7 @@ func (p *filePackage) AddFileFromMemory(ctx context.Context, path string, data [ targetEntry = metadata.NewFileEntry() targetEntry.FileID = newFileID targetEntry.Type = 0 // Default type (could be enhanced with content detection) - targetEntry.Paths = []generics.PathEntry{{Path: normalizedPath}} + targetEntry.Paths = []generics.PathEntry{{PathLength: uint16(len(normalizedPath)), Path: normalizedPath}} targetEntry.PathCount = 1 targetEntry.OriginalSize = originalSize targetEntry.RawChecksum = rawChecksum @@ -547,34 +551,25 @@ func (p *filePackage) AddFileFromMemory(ctx context.Context, path string, data [ // - []*metadata.FileEntry: Slice of created file entries (stub returns nil) // - error: *PackageError with ErrTypeUnsupported // -// Specification: api_file_mgmt_addition.md: 2.4 AddFilePattern Package Method +// Specification: api_file_mgmt_addition.md: 2.4 Package.AddFilePattern Method func (p *filePackage) AddFilePattern(ctx context.Context, pattern string, options *AddFileOptions) ([]*metadata.FileEntry, error) { - // Validate context - if err := internal.CheckContext(ctx, "AddFilePattern"); err != nil { + return p.addStubWithContextAndNonEmpty(ctx, "AddFilePattern", pattern, "pattern", "non-empty glob pattern", "pattern cannot be empty", "AddFilePattern full implementation deferred to Priority 2") +} + +// addStubWithContextAndNonEmpty validates context and non-empty value, then returns ErrTypeUnsupported stub. +func (p *filePackage) addStubWithContextAndNonEmpty(ctx context.Context, opName, value, fieldName, expectedMsg, emptyErrMsg, stubMsg string) ([]*metadata.FileEntry, error) { + if err := internal.CheckContext(ctx, opName); err != nil { return nil, err } - - // Validate pattern is not empty - if pattern == "" { + if value == "" { return nil, pkgerrors.NewPackageError( pkgerrors.ErrTypeValidation, - "pattern cannot be empty", + emptyErrMsg, nil, - pkgerrors.ValidationErrorContext{ - Field: "pattern", - Value: pattern, - Expected: "non-empty glob pattern", - }, + pkgerrors.ValidationErrorContext{Field: fieldName, Value: value, Expected: expectedMsg}, ) } - - // Stub: Return unsupported error - return nil, pkgerrors.NewPackageError[struct{}]( - pkgerrors.ErrTypeUnsupported, - "AddFilePattern full implementation deferred to Priority 2", - nil, - struct{}{}, - ) + return nil, pkgerrors.NewPackageError[struct{}](pkgerrors.ErrTypeUnsupported, stubMsg, nil, struct{}{}) } // AddDirectory recursively adds files from a directory to the package. @@ -591,34 +586,9 @@ func (p *filePackage) AddFilePattern(ctx context.Context, pattern string, option // - []*metadata.FileEntry: Slice of created file entries (stub returns nil) // - error: *PackageError with ErrTypeUnsupported // -// Specification: api_file_mgmt_addition.md: 2.5 AddDirectory Package Method +// Specification: api_file_mgmt_addition.md: 2.5 Package.AddDirectory Method func (p *filePackage) AddDirectory(ctx context.Context, dirPath string, options *AddFileOptions) ([]*metadata.FileEntry, error) { - // Validate context - if err := internal.CheckContext(ctx, "AddDirectory"); err != nil { - return nil, err - } - - // Validate dirPath is not empty - if dirPath == "" { - return nil, pkgerrors.NewPackageError( - pkgerrors.ErrTypeValidation, - "directory path cannot be empty", - nil, - pkgerrors.ValidationErrorContext{ - Field: "dirPath", - Value: dirPath, - Expected: "non-empty directory path", - }, - ) - } - - // Stub: Return unsupported error - return nil, pkgerrors.NewPackageError[struct{}]( - pkgerrors.ErrTypeUnsupported, - "AddDirectory full implementation deferred to Priority 2", - nil, - struct{}{}, - ) + return p.addStubWithContextAndNonEmpty(ctx, "AddDirectory", dirPath, "dirPath", "non-empty directory path", "directory path cannot be empty", "AddDirectory full implementation deferred to Priority 2") } // RemoveFile removes a file from the package. @@ -635,6 +605,8 @@ func (p *filePackage) AddDirectory(ctx context.Context, dirPath string, options // - error: *PackageError on failure // // Specification: api_file_mgmt_removal.md: 2. RemoveFile Package Method +// +//nolint:gocognit,gocyclo // validation and removal branches func (p *filePackage) RemoveFile(ctx context.Context, path string) error { // Validate context if err := internal.CheckContext(ctx, "RemoveFile"); err != nil { @@ -778,7 +750,7 @@ func (p *filePackage) ensurePathMetadata(path string, fileEntry *metadata.FileEn // Create new path metadata entry pathEntry := &metadata.PathMetadataEntry{ - Path: generics.PathEntry{Path: path}, + Path: generics.PathEntry{PathLength: uint16(len(path)), Path: path}, Type: metadata.PathMetadataTypeFile, AssociatedFileEntries: []*metadata.FileEntry{fileEntry}, ParentPath: nil, // Could be enhanced to find parent @@ -793,6 +765,8 @@ func (p *filePackage) ensurePathMetadata(path string, fileEntry *metadata.FileEn // determineStoredPath determines the stored package path from the filesystem path. // Implements the complete path determination logic per api_file_mgmt_addition.md Section 2.6 (Path Determination Rules). +// +//nolint:gocognit,gocyclo // path-determination branches func (p *filePackage) determineStoredPath(filesystemPath string, options *AddFileOptions) (string, error) { // Validate that at most one path determination option is set optionsSet := 0 @@ -1017,7 +991,7 @@ func (p *filePackage) captureFilesystemMetadata(storedPath string, fileInfo os.F } // Always capture IsExecutable (required) - pathMetadata.FileSystem.IsExecutable = (fileInfo.Mode() & 0111) != 0 + pathMetadata.FileSystem.IsExecutable = (fileInfo.Mode() & 0o111) != 0 // Capture additional metadata if requested preservePermissions := false @@ -1094,6 +1068,26 @@ func (p *filePackage) deriveSessionBase(filesystemPath string, preserveDepth int return basePath } +// validateStubContextAndNonEmpty checks context and that value is non-empty; used by stub methods. +func (p *filePackage) validateStubContextAndNonEmpty(ctx context.Context, opName, value, emptyErrMsg, expectedMsg, fieldName string) error { + if err := internal.CheckContext(ctx, opName); err != nil { + return err + } + if value == "" { + return pkgerrors.NewPackageError( + pkgerrors.ErrTypeValidation, + emptyErrMsg, + nil, + pkgerrors.ValidationErrorContext{ + Field: fieldName, + Value: value, + Expected: expectedMsg, + }, + ) + } + return nil +} + // RemoveFilePattern removes files matching a pattern from the package. // // STUB IMPLEMENTATION: This method validates inputs but returns ErrTypeUnsupported. @@ -1104,36 +1098,20 @@ func (p *filePackage) deriveSessionBase(filesystemPath string, preserveDepth int // - pattern: Glob pattern to match files // // Returns: +// - []string: Nil slice (stub implementation) // - error: *PackageError with ErrTypeUnsupported // // Specification: api_file_mgmt_removal.md: 3. RemoveFilePattern Package Method -func (p *filePackage) RemoveFilePattern(ctx context.Context, pattern string) error { - // Validate context - if err := internal.CheckContext(ctx, "RemoveFilePattern"); err != nil { - return err - } - - // Validate pattern is not empty - if pattern == "" { - return pkgerrors.NewPackageError( - pkgerrors.ErrTypeValidation, - "pattern cannot be empty", - nil, - pkgerrors.ValidationErrorContext{ - Field: "pattern", - Value: pattern, - Expected: "non-empty glob pattern", - }, - ) +func (p *filePackage) RemoveFilePattern(ctx context.Context, pattern string) ([]string, error) { + if err := p.validateStubContextAndNonEmpty(ctx, "RemoveFilePattern", pattern, "pattern cannot be empty", "non-empty glob pattern", "pattern"); err != nil { + return nil, err } + return nil, p.returnStubUnsupported("RemoveFilePattern full implementation deferred to Priority 2") +} - // Stub: Return unsupported error - return pkgerrors.NewPackageError[struct{}]( - pkgerrors.ErrTypeUnsupported, - "RemoveFilePattern full implementation deferred to Priority 2", - nil, - struct{}{}, - ) +// returnStubUnsupported returns ErrTypeUnsupported for stub methods. +func (p *filePackage) returnStubUnsupported(msg string) error { + return pkgerrors.NewPackageError[struct{}](pkgerrors.ErrTypeUnsupported, msg, nil, struct{}{}) } // RemoveDirectory removes files from a directory path in the package. @@ -1146,34 +1124,14 @@ func (p *filePackage) RemoveFilePattern(ctx context.Context, pattern string) err // - dirPath: Package directory path to remove files from // // Returns: +// - []string: Nil slice (stub implementation) // - error: *PackageError with ErrTypeUnsupported // // Specification: api_file_mgmt_removal.md: 4. RemoveDirectory Package Method -func (p *filePackage) RemoveDirectory(ctx context.Context, dirPath string) error { - // Validate context - if err := internal.CheckContext(ctx, "RemoveDirectory"); err != nil { - return err - } - - // Validate dirPath is not empty - if dirPath == "" { - return pkgerrors.NewPackageError( - pkgerrors.ErrTypeValidation, - "directory path cannot be empty", - nil, - pkgerrors.ValidationErrorContext{ - Field: "dirPath", - Value: dirPath, - Expected: "non-empty directory path", - }, - ) +func (p *filePackage) RemoveDirectory(ctx context.Context, dirPath string, options *RemoveDirectoryOptions) ([]string, error) { + _ = options + if err := p.validateStubContextAndNonEmpty(ctx, "RemoveDirectory", dirPath, "directory path cannot be empty", "non-empty directory path", "dirPath"); err != nil { + return nil, err } - - // Stub: Return unsupported error - return pkgerrors.NewPackageError[struct{}]( - pkgerrors.ErrTypeUnsupported, - "RemoveDirectory full implementation deferred to Priority 2", - nil, - struct{}{}, - ) + return nil, p.returnStubUnsupported("RemoveDirectory full implementation deferred to Priority 2") } diff --git a/api/go/novus_package/package_file_management_test.go b/api/go/novus_package/package_file_management_test.go index 9e21d41b..92b97b1f 100644 --- a/api/go/novus_package/package_file_management_test.go +++ b/api/go/novus_package/package_file_management_test.go @@ -31,7 +31,7 @@ func TestAddFile_BasicSuccess(t *testing.T) { tmpDir := t.TempDir() testFile := filepath.Join(tmpDir, "test.txt") testContent := []byte("Hello, World!") - if err := os.WriteFile(testFile, testContent, 0644); err != nil { + if err := os.WriteFile(testFile, testContent, 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -72,7 +72,7 @@ func TestAddFile_WithStoredPath(t *testing.T) { // Create temp test file tmpDir := t.TempDir() testFile := filepath.Join(tmpDir, "source.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("content"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -112,12 +112,12 @@ func TestAddFile_Deduplication(t *testing.T) { testContent := []byte("duplicate content") file1 := filepath.Join(tmpDir, "file1.txt") - if err := os.WriteFile(file1, testContent, 0644); err != nil { + if err := os.WriteFile(file1, testContent, 0o644); err != nil { t.Fatalf("Failed to create file1: %v", err) } file2 := filepath.Join(tmpDir, "file2.txt") - if err := os.WriteFile(file2, testContent, 0644); err != nil { + if err := os.WriteFile(file2, testContent, 0o644); err != nil { t.Fatalf("Failed to create file2: %v", err) } @@ -158,12 +158,12 @@ func TestAddFile_AllowDuplicate(t *testing.T) { testContent := []byte("duplicate content") file1 := filepath.Join(tmpDir, "file1.txt") - if err := os.WriteFile(file1, testContent, 0644); err != nil { + if err := os.WriteFile(file1, testContent, 0o644); err != nil { t.Fatalf("Failed to create file1: %v", err) } file2 := filepath.Join(tmpDir, "file2.txt") - if err := os.WriteFile(file2, testContent, 0644); err != nil { + if err := os.WriteFile(file2, testContent, 0o644); err != nil { t.Fatalf("Failed to create file2: %v", err) } @@ -190,12 +190,14 @@ func TestAddFile_AllowDuplicate(t *testing.T) { } // TestAddFile_Symlinks tests symlink handling with FollowSymlinks option. +// +//nolint:gocognit // table-driven symlink cases func TestAddFile_Symlinks(t *testing.T) { // Create temp test file and symlink tmpDir := t.TempDir() targetFile := filepath.Join(tmpDir, "target.txt") testContent := []byte("symlink target") - if err := os.WriteFile(targetFile, testContent, 0644); err != nil { + if err := os.WriteFile(targetFile, testContent, 0o644); err != nil { t.Fatalf("Failed to create target file: %v", err) } @@ -268,6 +270,8 @@ func TestAddFile_Symlinks(t *testing.T) { } // TestAddFile_ErrorCases tests various error conditions for AddFile. +// +//nolint:gocognit // table-driven error cases func TestAddFile_ErrorCases(t *testing.T) { pkg, err := NewPackage() if err != nil { @@ -307,7 +311,7 @@ func TestAddFile_ErrorCases(t *testing.T) { name: "Directory instead of file", setupFunc: func() string { dirPath := filepath.Join(tmpDir, "testdir") - _ = os.Mkdir(dirPath, 0755) + _ = os.Mkdir(dirPath, 0o755) return dirPath }, wantErrType: pkgerrors.ErrTypeValidation, @@ -355,7 +359,7 @@ func TestAddFile_ContextCancellation(t *testing.T) { // Create temp test file tmpDir := t.TempDir() testFile := filepath.Join(tmpDir, "test.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("content"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -522,24 +526,34 @@ func TestAddFileFromMemory_ErrorCases(t *testing.T) { }, } + runPathValidationTests(t, pkg, tests, ctx, func(pkg Package, ctx context.Context, path string) error { + _, err := pkg.AddFileFromMemory(ctx, path, testData, nil) + return err + }, "AddFileFromMemory") +} + +func runPathValidationTests(t *testing.T, pkg Package, tests []struct { + name string + path string + wantErrType pkgerrors.ErrorType + wantErrMsg string +}, ctx context.Context, fn func(Package, context.Context, string) error, opName string) { + t.Helper() for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - _, err := pkg.AddFileFromMemory(ctx, tt.path, testData, nil) + err := fn(pkg, ctx, tt.path) if err == nil { - t.Errorf("AddFileFromMemory(%q) succeeded, want error", tt.path) + t.Errorf("%s(%q) succeeded, want error", opName, tt.path) return } - pkgErr, ok := err.(*pkgerrors.PackageError) if !ok { t.Errorf("Error type = %T, want *pkgerrors.PackageError", err) return } - if pkgErr.Type != tt.wantErrType { t.Errorf("Error type = %v, want %v", pkgErr.Type, tt.wantErrType) } - if tt.wantErrMsg != "" && pkgErr.Message != tt.wantErrMsg { t.Errorf("Error message = %q, want %q", pkgErr.Message, tt.wantErrMsg) } @@ -623,50 +637,18 @@ func TestRemoveFile_ErrorCases(t *testing.T) { } ctx := context.Background() - tests := []struct { name string path string wantErrType pkgerrors.ErrorType wantErrMsg string }{ - { - name: "Empty path", - path: "", - wantErrType: pkgerrors.ErrTypeValidation, - wantErrMsg: "path cannot be empty or whitespace-only", - }, - { - name: "Whitespace-only path", - path: " ", - wantErrType: pkgerrors.ErrTypeValidation, - wantErrMsg: "path cannot be empty or whitespace-only", - }, - } - - for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - err := pkg.RemoveFile(ctx, tt.path) - if err == nil { - t.Errorf("RemoveFile(%q) succeeded, want error", tt.path) - return - } - - pkgErr, ok := err.(*pkgerrors.PackageError) - if !ok { - t.Errorf("Error type = %T, want *pkgerrors.PackageError", err) - return - } - - if pkgErr.Type != tt.wantErrType { - t.Errorf("Error type = %v, want %v", pkgErr.Type, tt.wantErrType) - } - - if tt.wantErrMsg != "" && pkgErr.Message != tt.wantErrMsg { - t.Errorf("Error message = %q, want %q", pkgErr.Message, tt.wantErrMsg) - } - }) + {"Empty path", "", pkgerrors.ErrTypeValidation, "path cannot be empty or whitespace-only"}, + {"Whitespace-only path", " ", pkgerrors.ErrTypeValidation, "path cannot be empty or whitespace-only"}, } + runPathValidationTests(t, pkg, tests, ctx, func(pkg Package, ctx context.Context, path string) error { + return pkg.RemoveFile(ctx, path) + }, "RemoveFile") } // ==================== @@ -729,12 +711,12 @@ func TestAddFile_WithBasePath(t *testing.T) { // Create test file in nested directory tmpDir := t.TempDir() nestedDir := filepath.Join(tmpDir, "project", "src") - if err := os.MkdirAll(nestedDir, 0755); err != nil { + if err := os.MkdirAll(nestedDir, 0o755); err != nil { t.Fatalf("Failed to create nested dir: %v", err) } testFile := filepath.Join(nestedDir, "main.go") - if err := os.WriteFile(testFile, []byte("package main"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("package main"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -772,12 +754,12 @@ func TestAddFile_WithPreserveDepth(t *testing.T) { // Create test file in nested directory tmpDir := t.TempDir() nestedDir := filepath.Join(tmpDir, "a", "b", "c") - if err := os.MkdirAll(nestedDir, 0755); err != nil { + if err := os.MkdirAll(nestedDir, 0o755); err != nil { t.Fatalf("Failed to create nested dir: %v", err) } testFile := filepath.Join(nestedDir, "file.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("content"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -807,12 +789,12 @@ func TestAddFile_WithFlattenPaths(t *testing.T) { // Create test file in nested directory tmpDir := t.TempDir() nestedDir := filepath.Join(tmpDir, "deep", "nested", "path") - if err := os.MkdirAll(nestedDir, 0755); err != nil { + if err := os.MkdirAll(nestedDir, 0o755); err != nil { t.Fatalf("Failed to create nested dir: %v", err) } testFile := filepath.Join(nestedDir, "file.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("content"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -846,12 +828,12 @@ func TestAddFile_WithSessionBase(t *testing.T) { // Create test file in nested directory tmpDir := t.TempDir() projectDir := filepath.Join(tmpDir, "myproject") - if err := os.MkdirAll(projectDir, 0755); err != nil { + if err := os.MkdirAll(projectDir, 0o755); err != nil { t.Fatalf("Failed to create project dir: %v", err) } testFile := filepath.Join(projectDir, "readme.txt") - if err := os.WriteFile(testFile, []byte("README"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("README"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -891,7 +873,7 @@ func TestAddFile_MultiplePathsForSameFile(t *testing.T) { tmpDir := t.TempDir() testFile := filepath.Join(tmpDir, "shared.txt") testContent := []byte("shared content") - if err := os.WriteFile(testFile, testContent, 0644); err != nil { + if err := os.WriteFile(testFile, testContent, 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } diff --git a/api/go/novus_package/package_identity.go b/api/go/novus_package/package_identity.go index 61c5d395..5c2e7f49 100644 --- a/api/go/novus_package/package_identity.go +++ b/api/go/novus_package/package_identity.go @@ -13,6 +13,14 @@ import ( "github.com/novus-engine/novuspack/api/go/pkgerrors" ) +// ensureInfoNotNil returns *PackageError if p.Info is nil. +func (p *filePackage) ensureInfoNotNil() error { + if p.Info == nil { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "package info is nil", nil, struct{}{}) + } + return nil +} + // SetAppID sets or updates the package AppID. // // Sets the AppID in PackageInfo (the single source of truth) and syncs to header. @@ -25,16 +33,11 @@ import ( // // Specification: api_metadata.md: 1. Comment Management func (p *filePackage) SetAppID(appID uint64) error { - // Validate package state - if p.Info == nil { - return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "package info is nil", nil, struct{}{}) + if err := p.ensureInfoNotNil(); err != nil { + return err } - - // Update PackageInfo (single source of truth) p.Info.AppID = appID - // Increment MetadataVersion in PackageInfo (metadata changed) p.Info.MetadataVersion++ - return nil } @@ -91,18 +94,11 @@ func (p *filePackage) HasAppID() bool { // // Specification: api_metadata.md: 2. AppID Management func (p *filePackage) SetVendorID(vendorID uint32) error { - // Validate package state - if p.Info == nil { - return pkgerrors.NewPackageError( - pkgerrors.ErrTypeValidation, "package info is nil", nil, struct{}{}, - ) + if err := p.ensureInfoNotNil(); err != nil { + return err } - - // Update PackageInfo (single source of truth) p.Info.VendorID = vendorID - // Increment MetadataVersion in PackageInfo (metadata changed) p.Info.MetadataVersion++ - return nil } @@ -184,7 +180,7 @@ func (p *filePackage) SetPackageIdentity(vendorID uint32, appID uint64) error { // - uint64: Current package AppID // // Specification: api_metadata.md: 4. Combined Management -func (p *filePackage) GetPackageIdentity() (uint32, uint64) { +func (p *filePackage) GetPackageIdentity() (vendorID uint32, appID uint64) { return p.GetVendorID(), p.GetAppID() } diff --git a/api/go/novus_package/package_identity_test.go b/api/go/novus_package/package_identity_test.go index d7183111..c8a4f360 100644 --- a/api/go/novus_package/package_identity_test.go +++ b/api/go/novus_package/package_identity_test.go @@ -13,133 +13,146 @@ import ( "github.com/novus-engine/novuspack/api/go/pkgerrors" ) -// ============================================================================= -// TEST: SetAppID / GetAppID / HasAppID / ClearAppID -// ============================================================================= - -// TestPackage_SetAppID_Basic tests basic SetAppID operation. -func TestPackage_SetAppID_Basic(t *testing.T) { +func mustNewFilePackage(t *testing.T) (Package, *filePackage) { + t.Helper() pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } - defer func() { _ = pkg.Close() }() + return pkg, pkg.(*filePackage) +} - fpkg := pkg.(*filePackage) +// identityOps holds set/get/has/clear/info callbacks for AppID or VendorID. +// Used by table-driven identity tests to avoid duplicating test logic. +type ( + setFn func(*filePackage, interface{}) error + getFn func(*filePackage) interface{} + hasFn func(*filePackage) bool + clearFn func(*filePackage) error + infoFn func(*filePackage) interface{} +) +type identityOps struct { + set setFn + get getFn + has hasFn + clear clearFn + info infoFn +} - // Set AppID - appID := uint64(12345) - err = fpkg.SetAppID(appID) - if err != nil { - t.Errorf("SetAppID() failed: %v", err) +func runIdentitySetGetHas(t *testing.T, fpkg *filePackage, val interface{}, ops identityOps, fieldName string) { + t.Helper() + if err := ops.set(fpkg, val); err != nil { + t.Errorf("Set%s() failed: %v", fieldName, err) } - - // Verify AppID was set - retrieved := fpkg.GetAppID() - if retrieved != appID { - t.Errorf("GetAppID() = %d, want %d", retrieved, appID) + if got := ops.get(fpkg); got != val { + t.Errorf("Get%s() = %v, want %v", fieldName, got, val) } - - if !fpkg.HasAppID() { - t.Error("HasAppID() should return true after SetAppID") + if !ops.has(fpkg) { + t.Errorf("Has%s() should return true after Set", fieldName) } } -// TestPackage_ClearAppID_Basic tests basic ClearAppID operation. -func TestPackage_ClearAppID_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) +func runIdentityClear(t *testing.T, fpkg *filePackage, setVal interface{}, ops identityOps, fieldName string) { + t.Helper() + if err := ops.set(fpkg, setVal); err != nil { + t.Fatalf("Set%s() failed: %v", fieldName, err) } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - - // Set AppID first - err = fpkg.SetAppID(12345) - if err != nil { - t.Fatalf("SetAppID() failed: %v", err) + if err := ops.clear(fpkg); err != nil { + t.Errorf("Clear%s() failed: %v", fieldName, err) } - - // Clear AppID - err = fpkg.ClearAppID() - if err != nil { - t.Errorf("ClearAppID() failed: %v", err) + if ops.get(fpkg) != uint64(0) && ops.get(fpkg) != uint32(0) { + t.Errorf("Get%s() after Clear = %v, want 0", fieldName, ops.get(fpkg)) } - - // Verify AppID was cleared - if fpkg.GetAppID() != 0 { - t.Errorf("GetAppID() = %d, want 0", fpkg.GetAppID()) + if ops.has(fpkg) { + t.Errorf("Has%s() should return false after Clear", fieldName) } +} - if fpkg.HasAppID() { - t.Error("HasAppID() should return false after ClearAppID") +func runIdentityHasLifecycle(t *testing.T, fpkg *filePackage, setVal interface{}, ops identityOps, fieldName string) { + t.Helper() + if ops.has(fpkg) { + t.Errorf("Has%s() should return false for new package", fieldName) + } + if err := ops.set(fpkg, setVal); err != nil { + t.Fatalf("Set%s() failed: %v", fieldName, err) + } + if !ops.has(fpkg) { + t.Errorf("Has%s() should return true after Set", fieldName) + } + if err := ops.clear(fpkg); err != nil { + t.Fatalf("Clear%s() failed: %v", fieldName, err) + } + if ops.has(fpkg) { + t.Errorf("Has%s() should return false after Clear", fieldName) } } -// TestPackage_HasAppID_Basic tests basic HasAppID operation. -func TestPackage_HasAppID_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) +func runIdentityPersistsInInfo(t *testing.T, fpkg *filePackage, val interface{}, ops identityOps, fieldName string) { + t.Helper() + if err := ops.set(fpkg, val); err != nil { + t.Fatalf("Set%s() failed: %v", fieldName, err) } - defer func() { _ = pkg.Close() }() + if ops.info(fpkg) != val { + t.Errorf("Info.%s = %v, want %v", fieldName, ops.info(fpkg), val) + } +} - fpkg := pkg.(*filePackage) +func makeIdentityOps(set setFn, get getFn, has hasFn, clearOp clearFn, info infoFn) identityOps { + return identityOps{set: set, get: get, has: has, clear: clearOp, info: info} +} - // Initially no AppID - if fpkg.HasAppID() { - t.Error("HasAppID() should return false for new package") - } +func appIDIdentityOps() identityOps { + return makeIdentityOps( + func(fpkg *filePackage, v interface{}) error { return fpkg.SetAppID(v.(uint64)) }, + func(fpkg *filePackage) interface{} { return fpkg.GetAppID() }, + func(fpkg *filePackage) bool { return fpkg.HasAppID() }, + func(fpkg *filePackage) error { return fpkg.ClearAppID() }, + func(fpkg *filePackage) interface{} { return fpkg.Info.AppID }, + ) +} - // Set AppID - err = fpkg.SetAppID(12345) - if err != nil { - t.Fatalf("SetAppID() failed: %v", err) - } +func vendorIDIdentityOps() identityOps { + set := func(fpkg *filePackage, v interface{}) error { return fpkg.SetVendorID(v.(uint32)) } + get := func(fpkg *filePackage) interface{} { return fpkg.GetVendorID() } + has := func(fpkg *filePackage) bool { return fpkg.HasVendorID() } + clearFn := func(fpkg *filePackage) error { return fpkg.ClearVendorID() } + info := func(fpkg *filePackage) interface{} { return fpkg.Info.VendorID } + return identityOps{set: set, get: get, has: has, clear: clearFn, info: info} +} - // Verify HasAppID returns true - if !fpkg.HasAppID() { - t.Error("HasAppID() should return true after SetAppID") - } +var appIDOps = appIDIdentityOps() +var vendorIDOps = vendorIDIdentityOps() - // Clear AppID - err = fpkg.ClearAppID() - if err != nil { - t.Fatalf("ClearAppID() failed: %v", err) - } +// ============================================================================= +// TEST: SetAppID / GetAppID / HasAppID / ClearAppID +// ============================================================================= - // Verify HasAppID returns false - if fpkg.HasAppID() { - t.Error("HasAppID() should return false after ClearAppID") - } +// TestPackage_SetAppID_Basic tests basic SetAppID operation. +func TestPackage_SetAppID_Basic(t *testing.T) { + pkg, fpkg := mustNewFilePackage(t) + defer func() { _ = pkg.Close() }() + runIdentitySetGetHas(t, fpkg, uint64(12345), appIDOps, "AppID") } -// TestPackage_AppID_PersistsInHeader tests that AppID is stored in Info and synced to header. -func TestPackage_AppID_PersistsInHeader(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } +// TestPackage_ClearAppID_Basic tests basic ClearAppID operation. +func TestPackage_ClearAppID_Basic(t *testing.T) { + pkg, fpkg := mustNewFilePackage(t) defer func() { _ = pkg.Close() }() + runIdentityClear(t, fpkg, uint64(12345), appIDOps, "AppID") +} - fpkg := pkg.(*filePackage) - - // Set AppID - appID := uint64(54321) - err = fpkg.SetAppID(appID) - if err != nil { - t.Fatalf("SetAppID() failed: %v", err) - } - - // Verify PackageInfo (single source of truth) - if fpkg.Info.AppID != appID { - t.Errorf("Info.AppID = %d, want %d", fpkg.Info.AppID, appID) - } +// TestPackage_HasAppID_Basic tests basic HasAppID operation. +func TestPackage_HasAppID_Basic(t *testing.T) { + pkg, fpkg := mustNewFilePackage(t) + defer func() { _ = pkg.Close() }() + runIdentityHasLifecycle(t, fpkg, uint64(12345), appIDOps, "AppID") +} - // Note: Header is NOT synced immediately after mutations. - // It will be synced during write operations (Write/SafeWrite/FastWrite). - // This follows the "PackageInfo as single source of truth" pattern. +// TestPackage_AppID_PersistsInHeader tests that AppID is stored in Info and synced to header. +func TestPackage_AppID_PersistsInHeader(t *testing.T) { + pkg, fpkg := mustNewFilePackage(t) + defer func() { _ = pkg.Close() }() + runIdentityPersistsInInfo(t, fpkg, uint64(54321), appIDOps, "AppID") } // ============================================================================= @@ -148,127 +161,30 @@ func TestPackage_AppID_PersistsInHeader(t *testing.T) { // TestPackage_SetVendorID_Basic tests basic SetVendorID operation. func TestPackage_SetVendorID_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } + pkg, fpkg := mustNewFilePackage(t) defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - - // Set VendorID - vendorID := uint32(67890) - err = fpkg.SetVendorID(vendorID) - if err != nil { - t.Errorf("SetVendorID() failed: %v", err) - } - - // Verify VendorID was set - retrieved := fpkg.GetVendorID() - if retrieved != vendorID { - t.Errorf("GetVendorID() = %d, want %d", retrieved, vendorID) - } - - if !fpkg.HasVendorID() { - t.Error("HasVendorID() should return true after SetVendorID") - } + runIdentitySetGetHas(t, fpkg, uint32(67890), vendorIDOps, "VendorID") } // TestPackage_ClearVendorID_Basic tests basic ClearVendorID operation. func TestPackage_ClearVendorID_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } + pkg, fpkg := mustNewFilePackage(t) defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - - // Set VendorID first - err = fpkg.SetVendorID(67890) - if err != nil { - t.Fatalf("SetVendorID() failed: %v", err) - } - - // Clear VendorID - err = fpkg.ClearVendorID() - if err != nil { - t.Errorf("ClearVendorID() failed: %v", err) - } - - // Verify VendorID was cleared - if fpkg.GetVendorID() != 0 { - t.Errorf("GetVendorID() = %d, want 0", fpkg.GetVendorID()) - } - - if fpkg.HasVendorID() { - t.Error("HasVendorID() should return false after ClearVendorID") - } + runIdentityClear(t, fpkg, uint32(67890), vendorIDOps, "VendorID") } // TestPackage_HasVendorID_Basic tests basic HasVendorID operation. func TestPackage_HasVendorID_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } + pkg, fpkg := mustNewFilePackage(t) defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - - // Initially no VendorID - if fpkg.HasVendorID() { - t.Error("HasVendorID() should return false for new package") - } - - // Set VendorID - err = fpkg.SetVendorID(67890) - if err != nil { - t.Fatalf("SetVendorID() failed: %v", err) - } - - // Verify HasVendorID returns true - if !fpkg.HasVendorID() { - t.Error("HasVendorID() should return true after SetVendorID") - } - - // Clear VendorID - err = fpkg.ClearVendorID() - if err != nil { - t.Fatalf("ClearVendorID() failed: %v", err) - } - - // Verify HasVendorID returns false - if fpkg.HasVendorID() { - t.Error("HasVendorID() should return false after ClearVendorID") - } + runIdentityHasLifecycle(t, fpkg, uint32(67890), vendorIDOps, "VendorID") } // TestPackage_VendorID_PersistsInHeader tests that VendorID is stored in Info and synced to header. func TestPackage_VendorID_PersistsInHeader(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } + pkg, fpkg := mustNewFilePackage(t) defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - - // Set VendorID - vendorID := uint32(98765) - err = fpkg.SetVendorID(vendorID) - if err != nil { - t.Fatalf("SetVendorID() failed: %v", err) - } - - // Verify PackageInfo (single source of truth) - if fpkg.Info.VendorID != vendorID { - t.Errorf("Info.VendorID = %d, want %d", fpkg.Info.VendorID, vendorID) - } - - // Note: Header is NOT synced immediately after mutations. - // It will be synced during write operations (Write/SafeWrite/FastWrite). - // This follows the "PackageInfo as single source of truth" pattern. + runIdentityPersistsInInfo(t, fpkg, uint32(98765), vendorIDOps, "VendorID") } // ============================================================================= @@ -370,52 +286,31 @@ func TestPackage_GetPackageIdentity_Basic(t *testing.T) { } } -// TestPackage_GetAppID_WithNilHeader tests GetAppID when header is nil. -func TestPackage_GetAppID_WithNilHeader(t *testing.T) { +func runGetterWithNilHeader(t *testing.T, getter func(*filePackage) interface{}, want interface{}, name string) { + t.Helper() pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } defer func() { _ = pkg.Close() }() - fpkg := pkg.(*filePackage) - - // Temporarily set header to nil originalHeader := fpkg.header fpkg.header = nil - - // GetAppID should return 0 when header is nil - appID := fpkg.GetAppID() - if appID != 0 { - t.Errorf("GetAppID() = %d, want 0 when header is nil", appID) + defer func() { fpkg.header = originalHeader }() + got := getter(fpkg) + if got != want { + t.Errorf("%s() = %v, want %v when header is nil", name, got, want) } +} - // Restore header - fpkg.header = originalHeader +// TestPackage_GetAppID_WithNilHeader tests GetAppID when header is nil. +func TestPackage_GetAppID_WithNilHeader(t *testing.T) { + runGetterWithNilHeader(t, func(fpkg *filePackage) interface{} { return fpkg.GetAppID() }, uint64(0), "GetAppID") } // TestPackage_GetVendorID_WithNilHeader tests GetVendorID when header is nil. func TestPackage_GetVendorID_WithNilHeader(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - - // Temporarily set header to nil - originalHeader := fpkg.header - fpkg.header = nil - - // GetVendorID should return 0 when header is nil - vendorID := fpkg.GetVendorID() - if vendorID != 0 { - t.Errorf("GetVendorID() = %d, want 0 when header is nil", vendorID) - } - - // Restore header - fpkg.header = originalHeader + runGetterWithNilHeader(t, func(fpkg *filePackage) interface{} { return fpkg.GetVendorID() }, uint32(0), "GetVendorID") } // TestPackage_SetPackageIdentity_WithNilHeader removed - obsolete test. @@ -449,212 +344,116 @@ func TestPackage_GetVendorID_WithNilHeader(t *testing.T) { // TestPackage_GetAppID_WithNilInfo tests GetAppID when Info is nil. // Expected: Should return 0 gracefully func TestPackage_GetAppID_WithNilInfo(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Manually set Info to nil - fpkg.Info = nil - - // GetAppID should return 0 (not panic) - appID := fpkg.GetAppID() - if appID != 0 { - t.Errorf("GetAppID() with nil Info = %d, want 0", appID) - } + runWithNilInfo(t, func(t *testing.T, fpkg *filePackage) { + appID := fpkg.GetAppID() + if appID != 0 { + t.Errorf("GetAppID() with nil Info = %d, want 0", appID) + } + }) } // TestPackage_GetVendorID_WithNilInfo tests GetVendorID when Info is nil. // Expected: Should return 0 gracefully func TestPackage_GetVendorID_WithNilInfo(t *testing.T) { + runWithNilInfo(t, func(t *testing.T, fpkg *filePackage) { + vendorID := fpkg.GetVendorID() + if vendorID != 0 { + t.Errorf("GetVendorID() with nil Info = %d, want 0", vendorID) + } + }) +} + +func runWithNilInfo(t *testing.T, check func(t *testing.T, fpkg *filePackage)) { + t.Helper() pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } defer func() { _ = pkg.Close() }() - fpkg := pkg.(*filePackage) - // Manually set Info to nil fpkg.Info = nil - - // GetVendorID should return 0 (not panic) - vendorID := fpkg.GetVendorID() - if vendorID != 0 { - t.Errorf("GetVendorID() with nil Info = %d, want 0", vendorID) - } + check(t, fpkg) } // TestPackage_HasAppID_WithNilInfo tests HasAppID when Info is nil. // Expected: Should return false gracefully func TestPackage_HasAppID_WithNilInfo(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Manually set Info to nil - fpkg.Info = nil - - // HasAppID should return false (not panic) - if fpkg.HasAppID() { - t.Error("HasAppID() with nil Info should return false") - } + runWithNilInfo(t, func(t *testing.T, fpkg *filePackage) { + if fpkg.HasAppID() { + t.Error("HasAppID() with nil Info should return false") + } + }) } // TestPackage_HasVendorID_WithNilInfo tests HasVendorID when Info is nil. // Expected: Should return false gracefully func TestPackage_HasVendorID_WithNilInfo(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Manually set Info to nil - fpkg.Info = nil - - // HasVendorID should return false (not panic) - if fpkg.HasVendorID() { - t.Error("HasVendorID() with nil Info should return false") - } + runWithNilInfo(t, func(t *testing.T, fpkg *filePackage) { + if fpkg.HasVendorID() { + t.Error("HasVendorID() with nil Info should return false") + } + }) } // TestPackage_GetPackageIdentity_WithNilInfo tests GetPackageIdentity when Info is nil. // Expected: Should return zeros gracefully func TestPackage_GetPackageIdentity_WithNilInfo(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Manually set Info to nil - fpkg.Info = nil + runWithNilInfo(t, func(t *testing.T, fpkg *filePackage) { + vendorID, appID := fpkg.GetPackageIdentity() + if vendorID != 0 { + t.Errorf("GetPackageIdentity() VendorID with nil Info = %d, want 0", vendorID) + } + if appID != 0 { + t.Errorf("GetPackageIdentity() AppID with nil Info = %d, want 0", appID) + } + }) +} - // GetPackageIdentity should return zeros (not panic) - vendorID, appID := fpkg.GetPackageIdentity() - if vendorID != 0 { - t.Errorf("GetPackageIdentity() VendorID with nil Info = %d, want 0", vendorID) - } - if appID != 0 { - t.Errorf("GetPackageIdentity() AppID with nil Info = %d, want 0", appID) - } +func assertSetWithNilInfoReturnsValidationError(t *testing.T, setFn func(*filePackage) error) { + t.Helper() + runWithNilInfo(t, func(t *testing.T, fpkg *filePackage) { + err := setFn(fpkg) + if err == nil { + t.Error("Set with nil Info should return error") + } + pkgErr := &pkgerrors.PackageError{} + if !asPackageError(err, pkgErr) { + t.Fatalf("Expected PackageError, got: %T", err) + } + if pkgErr.Type != pkgerrors.ErrTypeValidation { + t.Errorf("Expected error type Validation, got: %v", pkgErr.Type) + } + }) } // TestPackage_SetAppID_WithNilInfo tests SetAppID when Info is nil. // Expected: Should return validation error func TestPackage_SetAppID_WithNilInfo(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Manually set Info to nil - fpkg.Info = nil - - // SetAppID should return error - err = fpkg.SetAppID(12345) - if err == nil { - t.Error("SetAppID() with nil Info should return error") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeValidation { - t.Errorf("Expected error type Validation, got: %v", pkgErr.Type) - } + assertSetWithNilInfoReturnsValidationError(t, func(fpkg *filePackage) error { + return fpkg.SetAppID(12345) + }) } // TestPackage_SetVendorID_WithNilInfo tests SetVendorID when Info is nil. // Expected: Should return validation error func TestPackage_SetVendorID_WithNilInfo(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Manually set Info to nil - fpkg.Info = nil - - // SetVendorID should return error - err = fpkg.SetVendorID(12345) - if err == nil { - t.Error("SetVendorID() with nil Info should return error") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeValidation { - t.Errorf("Expected error type Validation, got: %v", pkgErr.Type) - } + assertSetWithNilInfoReturnsValidationError(t, func(fpkg *filePackage) error { + return fpkg.SetVendorID(12345) + }) } // TestPackage_SetPackageIdentity_WithNilInfo tests SetPackageIdentity when Info is nil. // Expected: Should return validation error func TestPackage_SetPackageIdentity_WithNilInfo(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Manually set Info to nil - fpkg.Info = nil - - // SetPackageIdentity should return error - err = fpkg.SetPackageIdentity(12345, 67890) - if err == nil { - t.Error("SetPackageIdentity() with nil Info should return error") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeValidation { - t.Errorf("Expected error type Validation, got: %v", pkgErr.Type) - } + assertSetWithNilInfoReturnsValidationError(t, func(fpkg *filePackage) error { + return fpkg.SetPackageIdentity(12345, 67890) + }) } // TestPackage_ClearPackageIdentity_WithNilInfo tests ClearPackageIdentity when Info is nil. // Expected: Should return validation error func TestPackage_ClearPackageIdentity_WithNilInfo(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Manually set Info to nil - fpkg.Info = nil - - // ClearPackageIdentity should return error - err = fpkg.ClearPackageIdentity() - if err == nil { - t.Error("ClearPackageIdentity() with nil Info should return error") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeValidation { - t.Errorf("Expected error type Validation, got: %v", pkgErr.Type) - } + assertSetWithNilInfoReturnsValidationError(t, func(fpkg *filePackage) error { + return fpkg.ClearPackageIdentity() + }) } diff --git a/api/go/novus_package/package_lifecycle.go b/api/go/novus_package/package_lifecycle.go index 90e84d75..983811b5 100644 --- a/api/go/novus_package/package_lifecycle.go +++ b/api/go/novus_package/package_lifecycle.go @@ -13,8 +13,10 @@ package novus_package import ( "context" + "encoding/binary" "fmt" "io" + "runtime" "strings" "time" @@ -98,7 +100,7 @@ func (p *filePackage) GetPath() string { // // Package is configured in memory, not yet written to disk // // Call Write() to actually write to disk // -// Specification: api_basic_operations.md: 6.2 Create Method +// Specification: api_basic_operations.md: 7.2 NewPackageWithOptions Behavior func (p *filePackage) Create(ctx context.Context, path string) error { // Validate context if err := internal.CheckContext(ctx, "Create"); err != nil { @@ -132,7 +134,7 @@ func (p *filePackage) Create(ctx context.Context, path string) error { // Calculate index position (right after header) - for future Write operations indexStart := uint64(fileformat.PackageHeaderSize) - indexSize := uint64(p.index.Size()) + indexSize := fileIndexSize(p.index) // Update header with index location (in memory only) p.header.IndexStart = indexStart @@ -179,7 +181,9 @@ func (p *filePackage) Create(ctx context.Context, path string) error { // } // fmt.Printf("Files: %d\n", info.FileCount) // -// Specification: api_basic_operations.md: 7.1 OpenPackage +// Specification: api_basic_operations.md: 10. OpenPackage Function +// +//nolint:gocognit,gocyclo // open/read/validate branches func OpenPackage(ctx context.Context, path string) (Package, error) { // Validate context if err := internal.CheckContext(ctx, "OpenPackage"); err != nil { @@ -215,13 +219,13 @@ func OpenPackage(ctx context.Context, path string) (Package, error) { } // Read the index - if _, err = index.ReadFrom(file); err != nil { + if _, err = readFileIndexFrom(file, index); err != nil { _ = file.Close() // Ignore error on cleanup path return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeIO, "failed to read file index", err, struct{}{}) } // Validate the index - if err := index.Validate(); err != nil { + if err := validateFileIndex(index); err != nil { _ = file.Close() // Ignore error on cleanup path return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "invalid file index", err, struct{}{}) } @@ -317,7 +321,7 @@ func OpenPackage(ctx context.Context, path string) (Package, error) { } // Load package comment if it exists - if header.HasComment() && header.CommentSize > 0 { + if header.CommentSize > 0 { if _, err := file.Seek(int64(header.CommentStart), 0); err != nil { _ = file.Close() return nil, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to seek to comment", pkgerrors.ValidationErrorContext{ @@ -348,11 +352,11 @@ func OpenPackage(ctx context.Context, path string) (Package, error) { } pkg.Info.HasComment = true - pkg.Info.Comment = comment.GetComment() // Use GetComment() to strip null terminator + pkg.Info.Comment = extractCommentText(comment) // Strip null terminator } // Load signature metadata if it exists - if header.IsSigned() && header.SignatureOffset > 0 { + if header.SignatureOffset > 0 { // TODO: Implement full signature loading (incremental signing support) // For now, just mark that signatures exist pkg.Info.HasSignatures = true @@ -376,7 +380,7 @@ func OpenPackage(ctx context.Context, path string) (Package, error) { } // Update compression info from header - pkg.Info.PackageCompression = uint8(header.GetCompressionType()) + pkg.Info.PackageCompression = extractCompressionType(header) pkg.Info.IsPackageCompressed = (pkg.Info.PackageCompression != 0) // TODO: Calculate PackageOriginalSize and PackageCompressedSize when package compression is implemented @@ -512,9 +516,9 @@ func OpenBrokenPackage(ctx context.Context, path string) (Package, error) { if header.IndexStart > 0 && header.IndexSize > 0 { if _, err := file.Seek(int64(header.IndexStart), 0); err == nil { index := fileformat.NewFileIndex() - if _, err := index.ReadFrom(file); err == nil { + if _, err := readFileIndexFrom(file, index); err == nil { // Only use the index if it validates successfully - if err := index.Validate(); err == nil { + if err := validateFileIndex(index); err == nil { pkg.index = index } } @@ -524,7 +528,7 @@ func OpenBrokenPackage(ctx context.Context, path string) (Package, error) { // Populate basic Info from header pkg.Info.FormatVersion = header.FormatVersion pkg.Info.FileCount = int(pkg.index.EntryCount) - pkg.Info.PackageCompression = uint8(header.GetCompressionType()) + pkg.Info.PackageCompression = extractCompressionType(header) pkg.Info.IsPackageCompressed = (pkg.Info.PackageCompression != 0) return pkg, nil @@ -605,7 +609,7 @@ func (p *readOnlyPackage) HasVendorID() bool { return p.inner.HasVendorID() } -func (p *readOnlyPackage) GetPackageIdentity() (uint32, uint64) { +func (p *readOnlyPackage) GetPackageIdentity() (vendorID uint32, appID uint64) { return p.inner.GetPackageIdentity() } @@ -683,12 +687,12 @@ func (p *readOnlyPackage) RemoveFile(ctx context.Context, path string) error { return p.readOnlyError("RemoveFile") } -func (p *readOnlyPackage) RemoveFilePattern(ctx context.Context, pattern string) error { - return p.readOnlyError("RemoveFilePattern") +func (p *readOnlyPackage) RemoveFilePattern(ctx context.Context, pattern string) ([]string, error) { + return nil, p.readOnlyError("RemoveFilePattern") } -func (p *readOnlyPackage) RemoveDirectory(ctx context.Context, dirPath string) error { - return p.readOnlyError("RemoveDirectory") +func (p *readOnlyPackage) RemoveDirectory(ctx context.Context, dirPath string, options *RemoveDirectoryOptions) ([]string, error) { + return nil, p.readOnlyError("RemoveDirectory") } // Target path management is rejected. @@ -729,6 +733,181 @@ func (p *readOnlyPackage) GetPath() string { return p.inner.GetPath() } +func extractCompressionType(header *fileformat.PackageHeader) uint8 { + if header == nil { + return 0 + } + return uint8((header.Flags & fileformat.FlagsMaskCompressionType) >> fileformat.FlagsShiftCompressionType) +} + +func fileIndexSize(index *fileformat.FileIndex) uint64 { + if index == nil { + return 0 + } + return uint64(16 + len(index.Entries)*fileformat.IndexEntrySize) +} + +func readFileIndexFrom(r io.Reader, index *fileformat.FileIndex) (int64, error) { + if index == nil { + return 0, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "file index is nil", nil, struct{}{}) + } + + entryCount, reserved, firstEntryOffset, totalRead, err := readFileIndexHeader(r) + if err != nil { + return totalRead, err + } + index.EntryCount = entryCount + index.Reserved = reserved + index.FirstEntryOffset = firstEntryOffset + if err := validateEntryCountAllocation(entryCount); err != nil { + return totalRead, err + } + index.Entries = make([]fileformat.IndexEntry, 0, entryCount) + for i := uint32(0); i < entryCount; i++ { + var entry fileformat.IndexEntry + if err := binary.Read(r, binary.LittleEndian, &entry); err != nil { + return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to read entry %d", i), pkgerrors.ValidationErrorContext{ + Field: "Entries", Value: i, Expected: "valid index entry", + }) + } + totalRead += fileformat.IndexEntrySize + index.Entries = append(index.Entries, entry) + } + return totalRead, nil +} + +func readFileIndexHeader(r io.Reader) (entryCount, reserved uint32, firstEntryOffset uint64, totalRead int64, err error) { + if err = binary.Read(r, binary.LittleEndian, &entryCount); err != nil { + if err == io.EOF || err == io.ErrUnexpectedEOF { + return 0, 0, 0, 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to read entry count: incomplete data", pkgerrors.ValidationErrorContext{ + Field: "EntryCount", Value: int64(0), Expected: "4 bytes", + }) + } + return 0, 0, 0, 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read entry count", pkgerrors.ValidationErrorContext{ + Field: "EntryCount", Value: nil, Expected: "4 bytes", + }) + } + totalRead = 4 + if err = binary.Read(r, binary.LittleEndian, &reserved); err != nil { + return 0, 0, 0, totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read reserved", pkgerrors.ValidationErrorContext{ + Field: "Reserved", Value: nil, Expected: "4 bytes", + }) + } + totalRead += 4 + if err = binary.Read(r, binary.LittleEndian, &firstEntryOffset); err != nil { + return 0, 0, 0, totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read first entry offset", pkgerrors.ValidationErrorContext{ + Field: "FirstEntryOffset", Value: nil, Expected: "8 bytes", + }) + } + totalRead += 8 + return entryCount, reserved, firstEntryOffset, totalRead, nil +} + +func validateEntryCountAllocation(entryCount uint32) error { + const maxInt = int(^uint(0) >> 1) + if int(entryCount) > maxInt { + return pkgerrors.WrapErrorWithContext( + fmt.Errorf("entry count %d exceeds maximum slice size %d", entryCount, maxInt), + pkgerrors.ErrTypeValidation, "entry count exceeds system allocation limits", + pkgerrors.ValidationErrorContext{Field: "EntryCount", Value: entryCount, Expected: fmt.Sprintf("value <= %d", maxInt)}, + ) + } + requiredBytes := uint64(entryCount) * uint64(fileformat.IndexEntrySize) + if entryCount > 0 && int(entryCount) > maxInt/int(fileformat.IndexEntrySize) { + return pkgerrors.WrapErrorWithContext( + fmt.Errorf("entry count %d would require allocation exceeding maximum slice size", entryCount), + pkgerrors.ErrTypeValidation, "entry count exceeds maximum allocation size", + pkgerrors.ValidationErrorContext{Field: "EntryCount", Value: entryCount, Expected: fmt.Sprintf("value <= %d", maxInt/int(fileformat.IndexEntrySize))}, + ) + } + if requiredBytes <= 1024*1024*1024 { + return nil + } + var memStats runtime.MemStats + runtime.ReadMemStats(&memStats) + maxReasonableAllocation := uint64(10 * 1024 * 1024 * 1024) + if memStats.Sys > 0 && memStats.Sys < maxReasonableAllocation*2 { + maxReasonableAllocation = memStats.Sys / 2 + } + if requiredBytes > maxReasonableAllocation { + return pkgerrors.WrapErrorWithContext( + fmt.Errorf("entry count %d would require %d bytes (%d GB), exceeding available system memory", entryCount, requiredBytes, requiredBytes/(1024*1024*1024)), + pkgerrors.ErrTypeValidation, "entry count exceeds available system memory", + pkgerrors.ValidationErrorContext{Field: "EntryCount", Value: entryCount, Expected: "value within available system memory constraints"}, + ) + } + return nil +} + +func validateFileIndex(index *fileformat.FileIndex) error { + if index == nil { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "file index is nil", nil, struct{}{}) + } + if index.Reserved != 0 { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "reserved field must be zero", nil, pkgerrors.ValidationErrorContext{ + Field: "Reserved", + Value: index.Reserved, + Expected: "0", + }) + } + if index.EntryCount != uint32(len(index.Entries)) { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "entry count mismatch", nil, pkgerrors.ValidationErrorContext{ + Field: "EntryCount", + Value: index.EntryCount, + Expected: fmt.Sprintf("%d", len(index.Entries)), + }) + } + for i, entry := range index.Entries { + if entry.FileID == 0 { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, fmt.Sprintf("file ID at index %d cannot be zero", i), nil, pkgerrors.ValidationErrorContext{ + Field: "Entries", + Value: i, + Expected: "non-zero FileID", + }) + } + } + seen := make(map[uint64]int) + for i, entry := range index.Entries { + if prev, exists := seen[entry.FileID]; exists { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, fmt.Sprintf("duplicate file ID %d at indices %d and %d", entry.FileID, prev, i), nil, pkgerrors.ValidationErrorContext{ + Field: "Entries", + Value: entry.FileID, + Expected: "unique FileID", + }) + } + seen[entry.FileID] = i + } + return nil +} + +func validatePackageHeader(header *fileformat.PackageHeader) error { + if header == nil { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "package header is nil", nil, struct{}{}) + } + if header.Magic != fileformat.NVPKMagic { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "invalid magic number", nil, pkgerrors.ValidationErrorContext{ + Field: "Magic", + Value: fmt.Sprintf("0x%08X", header.Magic), + Expected: fmt.Sprintf("0x%08X", fileformat.NVPKMagic), + }) + } + if header.FormatVersion != fileformat.FormatVersion { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "unsupported format version", nil, pkgerrors.ValidationErrorContext{ + Field: "FormatVersion", + Value: header.FormatVersion, + Expected: fmt.Sprintf("%d", fileformat.FormatVersion), + }) + } + if header.Reserved != 0 { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "reserved field must be 0", nil, pkgerrors.ValidationErrorContext{ + Field: "Reserved", + Value: header.Reserved, + Expected: "0", + }) + } + return nil +} + // Close closes the package and releases all resources. // // This method closes the file handle (if open), releases system resources, @@ -757,7 +936,7 @@ func (p *readOnlyPackage) GetPath() string { // } // defer pkg.Close() // Always close to release resources // -// Specification: api_basic_operations.md: 8.1 Close Method +// Specification: api_basic_operations.md: 13. Package.Close Method func (p *filePackage) Close() error { // If already closed, this is a no-op (idempotent) if !p.isOpen && p.fileHandle == nil { @@ -844,7 +1023,7 @@ func (p *filePackage) CreateWithOptions(ctx context.Context, path string, option // Returns: // - error: Error if closing or cleanup fails // -// Specification: api_basic_operations.md: 6.2 Create Method +// Specification: api_basic_operations.md: 14. Package.CloseWithCleanup Method func (p *filePackage) CloseWithCleanup(ctx context.Context) error { // Validate context if err := internal.CheckContext(ctx, "CloseWithCleanup"); err != nil { @@ -907,7 +1086,7 @@ func (p *filePackage) CloseWithCleanup(ctx context.Context) error { // } // fmt.Printf("Format Version: %d\n", header.FormatVersion) // -// Specification: api_basic_operations.md: 9.4 Header Inspection +// Specification: api_basic_operations.md: 18. Header Inspection func ReadHeader(ctx context.Context, reader io.Reader) (*fileformat.PackageHeader, error) { // Validate context if err := internal.CheckContext(ctx, "ReadHeader"); err != nil { @@ -963,7 +1142,7 @@ func ReadHeader(ctx context.Context, reader io.Reader) (*fileformat.PackageHeade // fmt.Printf("Magic: 0x%08X\n", header.Magic) // fmt.Printf("Index Start: %d\n", header.IndexStart) // -// Specification: api_basic_operations.md: 9.4 Header Inspection +// Specification: api_basic_operations.md: 18. Header Inspection func ReadHeaderFromPath(ctx context.Context, path string) (*fileformat.PackageHeader, error) { // Validate context if err := internal.CheckContext(ctx, "ReadHeaderFromPath"); err != nil { diff --git a/api/go/novus_package/package_lifecycle_test.go b/api/go/novus_package/package_lifecycle_test.go index a9c5117a..61d079e4 100644 --- a/api/go/novus_package/package_lifecycle_test.go +++ b/api/go/novus_package/package_lifecycle_test.go @@ -7,6 +7,7 @@ package novus_package import ( "context" + "encoding/binary" "os" "path/filepath" "strings" @@ -24,6 +25,8 @@ import ( // TEST: Create Operations // ============================================================================= +const testCommentLifecycle = "test comment" + // TestPackage_Create_Basic tests basic package creation. func TestPackage_Create_Basic(t *testing.T) { ctx := context.Background() @@ -161,7 +164,7 @@ func TestPackage_Create_WithValidPath(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { // Create parent directory if needed - if err := os.MkdirAll(filepath.Dir(tt.path), 0755); err != nil { + if err := os.MkdirAll(filepath.Dir(tt.path), 0o755); err != nil { t.Fatalf("Failed to create parent dir: %v", err) } @@ -247,23 +250,18 @@ func TestPackage_Create_WithWhitespacePath(t *testing.T) { } } -// TestPackage_Create_WithEmptyPath tests Create with empty path. -func TestPackage_Create_WithEmptyPath(t *testing.T) { +func runCreateExpectValidationError(t *testing.T, path, desc string) { + t.Helper() ctx := context.Background() - pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } - - // Try to create with empty path fpkg := pkg.(*filePackage) - err = fpkg.Create(ctx, "") + err = fpkg.Create(ctx, path) if err == nil { - t.Fatal("Create() should fail with empty path") + t.Fatalf("Create() should fail with %s", desc) } - - // Verify error type pkgErr := &pkgerrors.PackageError{} if !asPackageError(err, pkgErr) { t.Fatalf("Expected PackageError, got: %T", err) @@ -273,30 +271,14 @@ func TestPackage_Create_WithEmptyPath(t *testing.T) { } } +// TestPackage_Create_WithEmptyPath tests Create with empty path. +func TestPackage_Create_WithEmptyPath(t *testing.T) { + runCreateExpectValidationError(t, "", "empty path") +} + // TestPackage_Create_WithTabOnlyPath tests Create with tab-only path. func TestPackage_Create_WithTabOnlyPath(t *testing.T) { - ctx := context.Background() - - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - - // Try to create with tab-only path (whitespace-only) - fpkg := pkg.(*filePackage) - err = fpkg.Create(ctx, "\t\t\t") - if err == nil { - t.Fatal("Create() should fail with whitespace-only path (tabs)") - } - - // Verify error type - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeValidation { - t.Errorf("Expected error type Validation, got: %v", pkgErr.Type) - } + runCreateExpectValidationError(t, "\t\t\t", "whitespace-only path (tabs)") } // TestPackage_Create_WithReadOnlyDirectory tests Create in read-only directory. @@ -310,11 +292,11 @@ func TestPackage_Create_WithReadOnlyDirectory(t *testing.T) { tempDir := t.TempDir() // Make directory read-only - if err := os.Chmod(tempDir, 0444); err != nil { + if err := os.Chmod(tempDir, 0o444); err != nil { t.Fatalf("Failed to make directory read-only: %v", err) } defer func() { - _ = os.Chmod(tempDir, 0755) // Restore permissions for cleanup + _ = os.Chmod(tempDir, 0o755) // Restore permissions for cleanup }() pkg, err := NewPackage() @@ -381,7 +363,7 @@ func TestPackage_Open_ValidatesMagicNumber(t *testing.T) { invalidPath := filepath.Join(tmpDir, "invalid.nvpk") // Setup: Create an invalid file (not a NovusPack file) - err := os.WriteFile(invalidPath, []byte("This is not a NovusPack file"), 0644) + err := os.WriteFile(invalidPath, []byte("This is not a NovusPack file"), 0o644) if err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -396,82 +378,47 @@ func TestPackage_Open_ValidatesMagicNumber(t *testing.T) { } } -// TestPackage_Open_LoadsHeader tests that Open loads the package header. -func TestPackage_Open_LoadsHeader(t *testing.T) { - ctx := context.Background() +func runOpenPackageWithTestFile(t *testing.T, ctx context.Context, verify func(t *testing.T, pkg Package)) { + t.Helper() tmpDir := t.TempDir() pkgPath := filepath.Join(tmpDir, "test.nvpk") - - // Setup: Create a package pkg1, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } fpkg1 := pkg1.(*filePackage) - err = fpkg1.Create(ctx, pkgPath) - if err != nil { + if err := fpkg1.Create(ctx, pkgPath); err != nil { t.Fatalf("Create() failed: %v", err) } _ = pkg1.Close() - - // Create the file manually since Create() no longer writes to disk testutil.CreateTestPackageFile(t, pkgPath) - - // Test: Open and verify header is loaded pkg2, err := OpenPackage(ctx, pkgPath) if err != nil { t.Fatalf("OpenPackage() failed: %v", err) } defer func() { _ = pkg2.Close() }() + verify(t, pkg2) +} - info, err := pkg2.GetInfo() +func assertOpenPackageLoadsInfo(t *testing.T, pkg Package) { + t.Helper() + info, err := pkg.GetInfo() if err != nil { t.Errorf("GetInfo() failed: %v", err) } - if info == nil { t.Error("Info should not be nil after Open") } } +// TestPackage_Open_LoadsHeader tests that Open loads the package header. +func TestPackage_Open_LoadsHeader(t *testing.T) { + runOpenPackageWithTestFile(t, context.Background(), assertOpenPackageLoadsInfo) +} + // TestPackage_Open_LoadsFileIndex tests that Open loads the file index. func TestPackage_Open_LoadsFileIndex(t *testing.T) { - ctx := context.Background() - tmpDir := t.TempDir() - pkgPath := filepath.Join(tmpDir, "test.nvpk") - - // Setup: Create a package - pkg1, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - fpkg1 := pkg1.(*filePackage) - err = fpkg1.Create(ctx, pkgPath) - if err != nil { - t.Fatalf("Create() failed: %v", err) - } - _ = pkg1.Close() - - // Create the file manually since Create() no longer writes to disk - testutil.CreateTestPackageFile(t, pkgPath) - - // Test: Open and verify file index is accessible - pkg2, err := OpenPackage(ctx, pkgPath) - if err != nil { - t.Fatalf("OpenPackage() failed: %v", err) - } - defer func() { _ = pkg2.Close() }() - - // Verify through GetInfo (which should reflect index data) - info, err := pkg2.GetInfo() - if err != nil { - t.Errorf("GetInfo() failed: %v", err) - } - - // File count should be available (0 for empty package) - if info == nil { - t.Error("Info should not be nil after Open") - } + runOpenPackageWithTestFile(t, context.Background(), assertOpenPackageLoadsInfo) } // TestPackage_Open_ErrorConditions tests various error conditions for Open. @@ -507,26 +454,30 @@ func TestPackage_Open_ErrorConditions(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { pkg, err := OpenPackage(ctx, tt.path) - - if tt.shouldError { - if err == nil { - t.Error("Expected error but got none") - if pkg != nil { - _ = pkg.Close() - } - } - } else { - if err != nil { - t.Errorf("Unexpected error: %v", err) - } - if pkg != nil { - _ = pkg.Close() - } - } + assertOpenResult(t, tt.shouldError, pkg, err, "Expected error but got none", "Unexpected error: %v", err) }) } } +func assertOpenResult(t *testing.T, wantErr bool, pkg Package, err error, errNoneMsg, errUnexpectedFmt string, errUnexpectedArgs ...interface{}) { + t.Helper() + if wantErr { + if err == nil { + t.Error(errNoneMsg) + } + if pkg != nil { + _ = pkg.Close() + } + return + } + if err != nil { + t.Errorf(errUnexpectedFmt, errUnexpectedArgs...) + } + if pkg != nil { + _ = pkg.Close() + } +} + // TestPackage_Open_WithContext tests Open with context scenarios. func TestPackage_Open_WithContext(t *testing.T) { tmpDir := t.TempDir() @@ -567,22 +518,7 @@ func TestPackage_Open_WithContext(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { pkg, err := OpenPackage(tt.ctx, pkgPath) - - if tt.shouldError { - if err == nil { - t.Error("Expected error for cancelled context") - if pkg != nil { - _ = pkg.Close() - } - } - } else { - if err != nil { - t.Errorf("Unexpected error: %v", err) - } - if pkg != nil { - _ = pkg.Close() - } - } + assertOpenResult(t, tt.shouldError, pkg, err, "Expected error for cancelled context", "Unexpected error: %v", err) }) } } @@ -654,11 +590,10 @@ func TestPackage_OpenPackage_WithWhitespacePath(t *testing.T) { } } -// TestPackage_OpenPackage_WithCancelledContext tests OpenPackage with cancelled context. -func TestPackage_OpenPackage_WithCancelledContext(t *testing.T) { +// runCreatePackageThenWithCancelledContext creates a package file, then runs fn with a cancelled context; fails if fn returns nil error. +func runCreatePackageThenWithCancelledContext(t *testing.T, fn func(context.Context, string) error) { + t.Helper() ctx := context.Background() - - // Create a valid package first tempFile := filepath.Join(t.TempDir(), "test.nvpk") pkg, err := NewPackage() if err != nil { @@ -668,15 +603,21 @@ func TestPackage_OpenPackage_WithCancelledContext(t *testing.T) { if err := fpkg.Create(ctx, tempFile); err != nil { t.Fatalf("Create() failed: %v", err) } - - // Try to open with cancelled context cancelledCtx := testhelpers.CancelledContext() - _, err = OpenPackage(cancelledCtx, tempFile) + err = fn(cancelledCtx, tempFile) if err == nil { - t.Error("OpenPackage() should return error for cancelled context") + t.Error("expected error for cancelled context") } } +// TestPackage_OpenPackage_WithCancelledContext tests OpenPackage with cancelled context. +func TestPackage_OpenPackage_WithCancelledContext(t *testing.T) { + runCreatePackageThenWithCancelledContext(t, func(ctx context.Context, path string) error { + _, err := OpenPackage(ctx, path) + return err + }) +} + // TestPackage_OpenPackage_WithDirectory tests opening a directory instead of a file. func TestPackage_OpenPackage_WithDirectory(t *testing.T) { ctx := context.Background() @@ -724,7 +665,7 @@ func TestPackage_OpenPackage_WithCorruptedIndex(t *testing.T) { testutil.CreateTestPackageFile(t, tempFile) // Now corrupt the index by writing invalid data - file, err := os.OpenFile(tempFile, os.O_RDWR, 0644) + file, err := os.OpenFile(tempFile, os.O_RDWR, 0o644) if err != nil { t.Fatalf("Failed to open file for corruption: %v", err) } @@ -818,14 +759,14 @@ func TestPackage_OpenPackage_SeekFailure(t *testing.T) { testutil.CreateTestPackageFile(t, tempFile) // Modify the header to have an invalid index offset (beyond file size) - file, err := os.OpenFile(tempFile, os.O_RDWR, 0644) + file, err := os.OpenFile(tempFile, os.O_RDWR, 0o644) if err != nil { t.Fatalf("Failed to open file: %v", err) } // Read the header header := fileformat.NewPackageHeader() - if _, err := header.ReadFrom(file); err != nil { + if err := headerIO(t, file, header, headerIORead); err != nil { _ = file.Close() t.Fatalf("Failed to read header: %v", err) } @@ -838,7 +779,7 @@ func TestPackage_OpenPackage_SeekFailure(t *testing.T) { _ = file.Close() t.Fatalf("Failed to seek to start: %v", err) } - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write modified header: %v", err) } @@ -875,7 +816,7 @@ func TestPackage_OpenPackage_SeekError(t *testing.T) { header := fileformat.NewPackageHeader() header.IndexStart = 999999 // Invalid index start (beyond file size) header.IndexSize = 100 - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -911,7 +852,7 @@ func TestPackage_OpenPackage_IndexReadError(t *testing.T) { header := fileformat.NewPackageHeader() header.IndexStart = uint64(fileformat.PackageHeaderSize) header.IndexSize = 1000 // Claim large index but file is too small - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -959,14 +900,14 @@ func TestPackage_OpenPackage_IndexValidationError(t *testing.T) { {FileID: 1, Offset: 100}, {FileID: 2, Offset: 200}, } - header.IndexSize = uint64(index.Size()) + header.IndexSize = fileIndexSize(index) - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } // Write index with mismatched count - if _, err := index.WriteTo(file); err != nil { + if err := writeTestIndex(t, file, index); err != nil { _ = file.Close() t.Fatalf("Failed to write index: %v", err) } @@ -1004,7 +945,7 @@ func TestPackage_OpenPackage_NoIndex(t *testing.T) { header := fileformat.NewPackageHeader() header.IndexStart = 0 // No index header.IndexSize = 0 - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -1070,7 +1011,7 @@ func TestPackage_OpenPackage_WithIndexValidateError(t *testing.T) { // Now corrupt the index data to make Validate() fail // Open the file and corrupt the index section - file, err := os.OpenFile(tempFile, os.O_RDWR, 0644) + file, err := os.OpenFile(tempFile, os.O_RDWR, 0o644) if err != nil { t.Fatalf("Failed to open file: %v", err) } @@ -1115,7 +1056,7 @@ func TestPackage_OpenPackage_WithIndexReadFromError(t *testing.T) { // Corrupt the file by truncating it right after the header // This will cause index.ReadFrom to fail - file, err := os.OpenFile(tempFile, os.O_RDWR, 0644) + file, err := os.OpenFile(tempFile, os.O_RDWR, 0o644) if err != nil { t.Fatalf("Failed to open file: %v", err) } @@ -1159,7 +1100,7 @@ func TestPackage_OpenPackage_WithInvalidHeaderVersion(t *testing.T) { header := fileformat.NewPackageHeader() header.Magic = fileformat.NVPKMagic // Correct magic header.FormatVersion = 999 // Invalid version (too high) - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write invalid header: %v", err) } @@ -1195,7 +1136,7 @@ func TestPackage_CreateWithOptions(t *testing.T) { fpkg := pkg.(*filePackage) options := &CreateOptions{ - Comment: "test comment", + Comment: testCommentLifecycle, VendorID: 1, AppID: 100, } @@ -1212,7 +1153,7 @@ func TestPackage_CreateWithOptions(t *testing.T) { } // Verify options were applied in memory (using methods that don't require package to be open) - if !pkg.HasComment() || pkg.GetComment() != "test comment" { + if !pkg.HasComment() || pkg.GetComment() != testCommentLifecycle { t.Errorf("Comment not set: HasComment=%v, Comment=%v", pkg.HasComment(), pkg.GetComment()) } if pkg.GetVendorID() != 1 { @@ -1275,66 +1216,51 @@ func TestPackage_CreateWithOptions_CommentOnly(t *testing.T) { } } -// TestPackage_CreateWithOptions_VendorIDOnly tests CreateWithOptions with only VendorID set. -func TestPackage_CreateWithOptions_VendorIDOnly(t *testing.T) { +func runCreateWithOptionsSingleIdentity(t *testing.T, options *CreateOptions, assertFn func(t *testing.T, pkg Package)) { + t.Helper() ctx := context.Background() pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } - fpkg := pkg.(*filePackage) - options := &CreateOptions{ - VendorID: 42, - } - tempFile := filepath.Join(t.TempDir(), "test.nvpk") err = fpkg.CreateWithOptions(ctx, tempFile, options) if err != nil { t.Fatalf("CreateWithOptions() failed: %v", err) } + assertFn(t, pkg) +} - // Verify VendorID was set (using methods that don't require package to be open) - if pkg.GetVendorID() != 42 { - t.Errorf("VendorID = %v, want 42", pkg.GetVendorID()) +func assertCreateWithOptionsSingleIdentityFields(t *testing.T, pkg Package, wantVendorID uint32, wantAppID uint64) { + t.Helper() + if pkg.GetVendorID() != wantVendorID { + t.Errorf("VendorID = %v, want %v", pkg.GetVendorID(), wantVendorID) + } + if pkg.GetAppID() != wantAppID { + t.Errorf("AppID = %v, want %v", pkg.GetAppID(), wantAppID) } if pkg.HasComment() { t.Error("HasComment should be false when Comment is empty") } - if pkg.GetAppID() != 0 { - t.Errorf("AppID should be 0, got %v", pkg.GetAppID()) - } } -// TestPackage_CreateWithOptions_AppIDOnly tests CreateWithOptions with only AppID set. -func TestPackage_CreateWithOptions_AppIDOnly(t *testing.T) { - ctx := context.Background() - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } +func assertCreateWithOptionsVendorIDOnly(t *testing.T, pkg Package) { + assertCreateWithOptionsSingleIdentityFields(t, pkg, 42, 0) +} - fpkg := pkg.(*filePackage) - options := &CreateOptions{ - AppID: 999, - } +func assertCreateWithOptionsAppIDOnly(t *testing.T, pkg Package) { + assertCreateWithOptionsSingleIdentityFields(t, pkg, 0, 999) +} - tempFile := filepath.Join(t.TempDir(), "test.nvpk") - err = fpkg.CreateWithOptions(ctx, tempFile, options) - if err != nil { - t.Fatalf("CreateWithOptions() failed: %v", err) - } +// TestPackage_CreateWithOptions_VendorIDOnly tests CreateWithOptions with only VendorID set. +func TestPackage_CreateWithOptions_VendorIDOnly(t *testing.T) { + runCreateWithOptionsSingleIdentity(t, &CreateOptions{VendorID: 42}, assertCreateWithOptionsVendorIDOnly) +} - // Verify AppID was set (using methods that don't require package to be open) - if pkg.GetAppID() != 999 { - t.Errorf("AppID = %v, want 999", pkg.GetAppID()) - } - if pkg.HasComment() { - t.Error("HasComment should be false when Comment is empty") - } - if pkg.GetVendorID() != 0 { - t.Errorf("VendorID should be 0, got %v", pkg.GetVendorID()) - } +// TestPackage_CreateWithOptions_AppIDOnly tests CreateWithOptions with only AppID set. +func TestPackage_CreateWithOptions_AppIDOnly(t *testing.T) { + runCreateWithOptionsSingleIdentity(t, &CreateOptions{AppID: 999}, assertCreateWithOptionsAppIDOnly) } // TestPackage_CreateWithOptions_CancelledContext tests CreateWithOptions with cancelled context. @@ -1662,16 +1588,11 @@ func TestPackage_CloseWithCleanup_CloseErrorPropagation(t *testing.T) { // TEST: Close Operations // ============================================================================= -// TestPackage_Close_Basic tests basic package closing. -// -// Expected behavior (Red Phase - should FAIL): -// - Close method does not exist -func TestPackage_Close_Basic(t *testing.T) { +func runCloseSucceedsAndClearsOpen(t *testing.T) { + t.Helper() ctx := context.Background() tmpDir := t.TempDir() pkgPath := filepath.Join(tmpDir, "test.nvpk") - - // Setup: Create and open a package pkg1, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) @@ -1681,19 +1602,23 @@ func TestPackage_Close_Basic(t *testing.T) { if err != nil { t.Fatalf("Create() failed: %v", err) } - - // Test: Close should succeed err = pkg1.Close() if err != nil { t.Errorf("Close() failed: %v", err) } - - // Test: Package should not be open after Close if pkg1.IsOpen() { t.Error("Package should not be open after Close") } } +// TestPackage_Close_Basic tests basic package closing. +// +// Expected behavior (Red Phase - should FAIL): +// - Close method does not exist +func TestPackage_Close_Basic(t *testing.T) { + runCloseSucceedsAndClearsOpen(t) +} + // TestPackage_Close_ClosesFileHandle tests that Close releases file handle. // // Expected behavior (Red Phase - should FAIL): @@ -1831,31 +1756,7 @@ func TestPackage_Close_Multiple(t *testing.T) { // Expected behavior (Red Phase - should FAIL): // - Resource cleanup not implemented func TestPackage_Close_ResourceCleanup(t *testing.T) { - ctx := context.Background() - tmpDir := t.TempDir() - pkgPath := filepath.Join(tmpDir, "test.nvpk") - - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - fpkg := pkg.(*filePackage) - err = fpkg.Create(ctx, pkgPath) - if err != nil { - t.Fatalf("Create() failed: %v", err) - } - - // Close should release all resources - err = pkg.Close() - if err != nil { - t.Errorf("Close() failed: %v", err) - } - - // Test: State should be cleared - if pkg.IsOpen() { - t.Error("IsOpen should return false after Close") - } - + runCloseSucceedsAndClearsOpen(t) // Note: We can't directly verify memory/buffer cleanup without instrumentation, // but we verify the package is unusable after Close } @@ -2080,6 +1981,8 @@ func TestPackage_OpenPackageReadOnly(t *testing.T) { } // TestPackage_OpenPackageReadOnly_RejectsMutation tests that OpenPackageReadOnly rejects mutation operations. +// +//nolint:gocognit // table-driven mutation cases func TestPackage_OpenPackageReadOnly_RejectsMutation(t *testing.T) { ctx := context.Background() tempFile := filepath.Join(t.TempDir(), "test.nvpk") @@ -2453,7 +2356,7 @@ func TestPackage_OpenPackage_ResourceCleanupOnError(t *testing.T) { t.Fatalf("Failed to create file: %v", err) } // Write invalid header - _, _ = file.Write([]byte("INVALID")) + _, _ = file.WriteString("INVALID") _ = file.Close() _, err = OpenPackage(ctx, tempFile) @@ -2482,7 +2385,7 @@ func TestPackage_OpenPackage_IndexStartZero(t *testing.T) { header.IndexStart = 0 header.IndexSize = 0 - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -2515,7 +2418,7 @@ func TestPackage_OpenPackage_IndexReadFailure(t *testing.T) { header.IndexStart = uint64(fileformat.PackageHeaderSize) header.IndexSize = 20 // Claim index exists (16 bytes header + at least 16 bytes for 1 entry) - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -2575,13 +2478,13 @@ func TestPackage_OpenPackage_IndexValidationFailure(t *testing.T) { {FileID: 1, Offset: 100}, {FileID: 1, Offset: 200}, // Duplicate FileID - will fail validation } - header.IndexSize = uint64(index.Size()) + header.IndexSize = fileIndexSize(index) - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } - if _, err := index.WriteTo(file); err != nil { + if err := writeTestIndex(t, file, index); err != nil { _ = file.Close() t.Fatalf("Failed to write index: %v", err) } @@ -2621,7 +2524,7 @@ func TestPackage_OpenBrokenPackage_ValidHeader(t *testing.T) { header := fileformat.NewPackageHeader() header.IndexStart = uint64(fileformat.PackageHeaderSize) header.IndexSize = 100 - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -2701,7 +2604,7 @@ func TestPackage_OpenBrokenPackage_NoIndex(t *testing.T) { header := fileformat.NewPackageHeader() header.IndexStart = 0 header.IndexSize = 0 - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -2743,7 +2646,7 @@ func TestPackage_OpenBrokenPackage_ReadFileDoesNotPanic(t *testing.T) { header := fileformat.NewPackageHeader() header.IndexStart = 0 header.IndexSize = 0 - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -2880,7 +2783,7 @@ func TestPackage_ReadHeader_ValidatesMagic(t *testing.T) { invalidPath := filepath.Join(tmpDir, "invalid.nvpk") // Setup: Create invalid file - err := os.WriteFile(invalidPath, []byte("Not a NovusPack file"), 0644) + err := os.WriteFile(invalidPath, []byte("Not a NovusPack file"), 0o644) if err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -2894,24 +2797,10 @@ func TestPackage_ReadHeader_ValidatesMagic(t *testing.T) { // TestPackage_ReadHeader_WithCancelledContext tests ReadHeader with cancelled context. func TestPackage_ReadHeader_WithCancelledContext(t *testing.T) { - ctx := context.Background() - // Create a valid package - tempFile := filepath.Join(t.TempDir(), "test.nvpk") - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - fpkg := pkg.(*filePackage) - if err := fpkg.Create(ctx, tempFile); err != nil { - t.Fatalf("Create() failed: %v", err) - } - - // Try to read header with cancelled context - cancelledCtx := testhelpers.CancelledContext() - _, err = ReadHeaderFromPath(cancelledCtx, tempFile) - if err == nil { - t.Error("ReadHeader() should return error for cancelled context") - } + runCreatePackageThenWithCancelledContext(t, func(ctx context.Context, path string) error { + _, err := ReadHeaderFromPath(ctx, path) + return err + }) } // TestPackage_ReadHeader_WithInvalidMagic tests ReadHeader with invalid magic number. @@ -2934,7 +2823,7 @@ func TestPackage_ReadHeader_WithInvalidMagic(t *testing.T) { testutil.CreateTestPackageFile(t, tempFile) // Now modify just the magic number - file, err := os.OpenFile(tempFile, os.O_RDWR, 0644) + file, err := os.OpenFile(tempFile, os.O_RDWR, 0o644) if err != nil { t.Fatalf("Failed to open file: %v", err) } @@ -3115,7 +3004,7 @@ func TestReadAndValidateHeader_UnsupportedVersion(t *testing.T) { testutil.CreateTestPackageFile(t, tempFile) // Modify the file to have unsupported version - file, err := os.OpenFile(tempFile, os.O_RDWR, 0644) + file, err := os.OpenFile(tempFile, os.O_RDWR, 0o644) if err != nil { t.Fatalf("Failed to open file: %v", err) } @@ -3203,7 +3092,7 @@ func TestReadAndValidateHeader_WithHeaderValidateError(t *testing.T) { header := fileformat.NewPackageHeader() header.Magic = fileformat.NVPKMagic header.FormatVersion = 999 // Invalid version that might fail validation - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -3281,7 +3170,7 @@ func TestReadAndValidateHeader_WithMagicCheckAfterReadFrom(t *testing.T) { // Write a header with wrong magic (but valid header structure) header := fileformat.NewPackageHeader() header.Magic = 0xDEADBEEF // Wrong magic - if _, err := header.WriteTo(file); err != nil { + if err := headerIO(t, file, header, headerIOWrite); err != nil { _ = file.Close() t.Fatalf("Failed to write header: %v", err) } @@ -3352,3 +3241,41 @@ func TestReadAndValidateHeader_WithReadFromMagicError(t *testing.T) { t.Errorf("Expected validation error, got: %v", pkgErr.Type) } } + +type headerIOOp uint8 + +const ( + headerIORead headerIOOp = iota + headerIOWrite +) + +func headerIO(t *testing.T, file *os.File, header *fileformat.PackageHeader, op headerIOOp) error { + t.Helper() + switch op { + case headerIORead: + return binary.Read(file, binary.LittleEndian, header) + case headerIOWrite: + return binary.Write(file, binary.LittleEndian, header) + default: + return nil + } +} + +func writeTestIndex(t *testing.T, file *os.File, index *fileformat.FileIndex) error { + t.Helper() + if err := binary.Write(file, binary.LittleEndian, index.EntryCount); err != nil { + return err + } + if err := binary.Write(file, binary.LittleEndian, index.Reserved); err != nil { + return err + } + if err := binary.Write(file, binary.LittleEndian, index.FirstEntryOffset); err != nil { + return err + } + for i := range index.Entries { + if err := binary.Write(file, binary.LittleEndian, index.Entries[i]); err != nil { + return err + } + } + return nil +} diff --git a/api/go/novus_package/package_path_canonicalization_integration_test.go b/api/go/novus_package/package_path_canonicalization_integration_test.go index 40f6c9cd..20f28dd9 100644 --- a/api/go/novus_package/package_path_canonicalization_integration_test.go +++ b/api/go/novus_package/package_path_canonicalization_integration_test.go @@ -3,11 +3,12 @@ // It verifies that the path canonicalization logic correctly handles dot segments // in real package operations. // -// Specification: api_core.md: 1.1.2 Package Path Semantics +// Specification: api_core.md: 2. Package Path Semantics package novus_package import ( + "bytes" "context" "os" "path/filepath" @@ -24,7 +25,7 @@ func TestPathCanonicalization_AddFile_Integration(t *testing.T) { tmpDir := t.TempDir() testFile := filepath.Join(tmpDir, "test.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("content"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -81,7 +82,7 @@ func TestPathCanonicalization_ReadFile_Integration(t *testing.T) { t.Fatalf("ReadFile failed: %v", err) } - if string(data) != string(testContent) { + if !bytes.Equal(data, testContent) { t.Errorf("ReadFile content mismatch: got %q, want %q", string(data), string(testContent)) } } @@ -164,7 +165,7 @@ func TestPathCanonicalization_RoundTrip(t *testing.T) { t.Fatalf("ReadFile failed: %v", err) } - if string(data) != string(testContent) { + if !bytes.Equal(data, testContent) { t.Errorf("Content mismatch: got %q, want %q", string(data), string(testContent)) } diff --git a/api/go/novus_package/package_path_metadata_associations.go b/api/go/novus_package/package_path_metadata_associations.go index 80605edf..de1e24e7 100644 --- a/api/go/novus_package/package_path_metadata_associations.go +++ b/api/go/novus_package/package_path_metadata_associations.go @@ -28,7 +28,7 @@ import ( // - error: *PackageError on failure // // Specification: api_metadata.md: 8.2 PathMetadata Management Methods -func (p *filePackage) AssociateFileWithPath(ctx context.Context, filePath string, path string) error { +func (p *filePackage) AssociateFileWithPath(ctx context.Context, filePath, path string) error { if err := internal.CheckContext(ctx, "AssociateFileWithPath"); err != nil { return err } @@ -69,6 +69,8 @@ func (p *filePackage) AssociateFileWithPath(ctx context.Context, filePath string // - error: *PackageError on failure // // Specification: api_metadata.md: 8.2 PathMetadata Management Methods +// +//nolint:gocognit // lookup and update branches func (p *filePackage) DisassociateFileFromPath(ctx context.Context, filePath string) error { if err := internal.CheckContext(ctx, "DisassociateFileFromPath"); err != nil { return err @@ -115,6 +117,8 @@ func (p *filePackage) DisassociateFileFromPath(ctx context.Context, filePath str // - error: *PackageError on failure // // Specification: api_metadata.md: 8.2 PathMetadata Management Methods +// +//nolint:gocognit // iteration and update branches func (p *filePackage) UpdateFilePathAssociations(ctx context.Context) error { if err := internal.CheckContext(ctx, "UpdateFilePathAssociations"); err != nil { return err diff --git a/api/go/novus_package/package_path_metadata_associations_test.go b/api/go/novus_package/package_path_metadata_associations_test.go index 432dd42f..61e5e885 100644 --- a/api/go/novus_package/package_path_metadata_associations_test.go +++ b/api/go/novus_package/package_path_metadata_associations_test.go @@ -425,28 +425,9 @@ func TestPackage_UpdateFilePathAssociations_MultiplePaths(t *testing.T) { // TestPackage_UpdateFilePathAssociations_CancelledContext tests UpdateFilePathAssociations with cancelled context. // Expected: Should return context error func TestPackage_UpdateFilePathAssociations_CancelledContext(t *testing.T) { - cancelledCtx := testhelpers.CancelledContext() - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - - // UpdateFilePathAssociations should fail with cancelled context - err = fpkg.UpdateFilePathAssociations(cancelledCtx) - if err == nil { - t.Fatal("UpdateFilePathAssociations should fail with cancelled context") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeContext { - t.Errorf("Error type = %v, want ErrTypeContext", pkgErr.Type) - } + runContextCancelledTest(t, func(fpkg *filePackage, ctx context.Context) error { + return fpkg.UpdateFilePathAssociations(ctx) + }) } // ============================================================================= @@ -567,17 +548,7 @@ func TestPackage_GetFilePathAssociations_WithNilEntries(t *testing.T) { // TestPackage_GetFilePathAssociations_WithContext tests GetFilePathAssociations with cancelled context. func TestPackage_GetFilePathAssociations_WithContext(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - // Test with cancelled context - cancelledCtx := testhelpers.CancelledContext() - fpkg := pkg.(*filePackage) - _, err = fpkg.GetFilePathAssociations(cancelledCtx) - if err == nil { - t.Error("GetFilePathAssociations should fail with cancelled context") - } + runWithCancelledContext(t, func(fpkg *filePackage, ctx context.Context) (interface{}, error) { + return fpkg.GetFilePathAssociations(ctx) + }, "GetFilePathAssociations") } diff --git a/api/go/novus_package/package_path_metadata_directories.go b/api/go/novus_package/package_path_metadata_directories.go index da279595..1b8b3589 100644 --- a/api/go/novus_package/package_path_metadata_directories.go +++ b/api/go/novus_package/package_path_metadata_directories.go @@ -118,6 +118,8 @@ func (p *filePackage) UpdateDirectoryMetadata(ctx context.Context, path string, // - error: *PackageError on failure // // Specification: api_metadata.md: 8.2 PathMetadata Management Methods +// +//nolint:gocognit // hierarchy walk branches func (p *filePackage) ListDirectories() ([]PathInfo, error) { // This is an in-memory operation directories := make(map[string]*PathInfo) diff --git a/api/go/novus_package/package_path_metadata_directories_test.go b/api/go/novus_package/package_path_metadata_directories_test.go index c1a94783..38e0b011 100644 --- a/api/go/novus_package/package_path_metadata_directories_test.go +++ b/api/go/novus_package/package_path_metadata_directories_test.go @@ -11,15 +11,16 @@ import ( "testing" "github.com/novus-engine/novuspack/api/go/fileformat" - "github.com/novus-engine/novuspack/api/go/internal/testhelpers" "github.com/novus-engine/novuspack/api/go/metadata" - "github.com/novus-engine/novuspack/api/go/pkgerrors" ) // ============================================================================= // TEST: AddDirectoryMetadata // ============================================================================= +const testDirPath = "test/dir" +const testDirPathSlash = "test/dir/" + // TestPackage_AddDirectoryMetadata_Basic tests basic AddDirectoryMetadata operation. func TestPackage_AddDirectoryMetadata_Basic(t *testing.T) { ctx := context.Background() @@ -36,7 +37,7 @@ func TestPackage_AddDirectoryMetadata_Basic(t *testing.T) { fpkg.Info = metadata.NewPackageInfo() fpkg.SpecialFiles = make(map[uint16]*metadata.FileEntry) fpkg.PathMetadataEntries = make([]*metadata.PathMetadataEntry, 0) - err = fpkg.AddDirectoryMetadata(ctx, "test/dir", nil, nil, nil) + err = fpkg.AddDirectoryMetadata(ctx, testDirPath, nil, nil, nil) if err != nil { t.Errorf("AddDirectoryMetadata failed: %v", err) } @@ -57,7 +58,7 @@ func TestPackage_RemoveDirectoryMetadata_Basic(t *testing.T) { // RemoveDirectoryMetadata should return error when LoadPathMetadataFile is not implemented fpkg := pkg.(*filePackage) - err = fpkg.RemoveDirectoryMetadata(ctx, "test/dir") + err = fpkg.RemoveDirectoryMetadata(ctx, testDirPath) if err == nil { t.Error("RemoveDirectoryMetadata should return error when LoadPathMetadataFile is not implemented") } @@ -78,7 +79,7 @@ func TestPackage_UpdateDirectoryMetadata_Basic(t *testing.T) { // UpdateDirectoryMetadata should return error when LoadPathMetadataFile is not implemented fpkg := pkg.(*filePackage) - err = fpkg.UpdateDirectoryMetadata(ctx, "test/dir", nil, nil, nil) + err = fpkg.UpdateDirectoryMetadata(ctx, testDirPath, nil, nil, nil) if err == nil { t.Error("UpdateDirectoryMetadata should return error when LoadPathMetadataFile is not implemented") } @@ -88,20 +89,24 @@ func TestPackage_UpdateDirectoryMetadata_Basic(t *testing.T) { // TEST: ListDirectories // ============================================================================= -// TestPackage_ListDirectories_Basic tests basic ListDirectories operation. -func TestPackage_ListDirectories_Basic(t *testing.T) { +func setupOpenFilePackageForPathMetadata(t *testing.T) (Package, *filePackage) { + t.Helper() pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage() failed: %v", err) } - defer func() { _ = pkg.Close() }() - - // ListDirectories should now succeed (in-memory operation) fpkg := pkg.(*filePackage) fpkg.isOpen = true fpkg.SpecialFiles = make(map[uint16]*metadata.FileEntry) fpkg.PathMetadataEntries = make([]*metadata.PathMetadataEntry, 0) - _, err = fpkg.ListDirectories() + return pkg, fpkg +} + +// TestPackage_ListDirectories_Basic tests basic ListDirectories operation. +func TestPackage_ListDirectories_Basic(t *testing.T) { + pkg, fpkg := setupOpenFilePackageForPathMetadata(t) + defer func() { _ = pkg.Close() }() + _, err := fpkg.ListDirectories() if err != nil { t.Errorf("ListDirectories failed: %v", err) } @@ -112,6 +117,8 @@ func TestPackage_ListDirectories_Basic(t *testing.T) { // ============================================================================= // TestPackage_AddDirectoryMetadata_PathNormalization tests AddDirectoryMetadata path normalization. +// +//nolint:gocognit // table-driven path normalization func TestPackage_AddDirectoryMetadata_PathNormalization(t *testing.T) { ctx := context.Background() pkg, err := NewPackage() @@ -124,7 +131,7 @@ func TestPackage_AddDirectoryMetadata_PathNormalization(t *testing.T) { fpkg.PathMetadataEntries = []*metadata.PathMetadataEntry{} // Test path that doesn't end with / - pathWithoutSlash := "test/dir" + pathWithoutSlash := testDirPath err = fpkg.AddDirectoryMetadata(ctx, pathWithoutSlash, nil, nil, nil) // Should fail because SavePathMetadataFile is not implemented, but path should be normalized if err == nil { @@ -138,7 +145,7 @@ func TestPackage_AddDirectoryMetadata_PathNormalization(t *testing.T) { } // Test path that already ends with / - pathWithSlash := "test/dir/" + pathWithSlash := testDirPathSlash fpkg.PathMetadataEntries = []*metadata.PathMetadataEntry{} err = fpkg.AddDirectoryMetadata(ctx, pathWithSlash, nil, nil, nil) // Should fail because SavePathMetadataFile is not implemented @@ -179,12 +186,12 @@ func TestPackage_RemoveDirectoryMetadata_PathNormalization(t *testing.T) { fpkg := pkg.(*filePackage) // Test path that doesn't end with / - pathWithoutSlash := "test/dir" + pathWithoutSlash := testDirPath _ = fpkg.RemoveDirectoryMetadata(ctx, pathWithoutSlash) // Should fail because LoadPathMetadataFile is not implemented // Test path that already ends with / - pathWithSlash := "test/dir/" + pathWithSlash := testDirPathSlash _ = fpkg.RemoveDirectoryMetadata(ctx, pathWithSlash) // Should fail because LoadPathMetadataFile is not implemented } @@ -201,12 +208,12 @@ func TestPackage_UpdateDirectoryMetadata_PathNormalization(t *testing.T) { fpkg := pkg.(*filePackage) // Test path that doesn't end with / - pathWithoutSlash := "test/dir" + pathWithoutSlash := testDirPath _ = fpkg.UpdateDirectoryMetadata(ctx, pathWithoutSlash, nil, nil, nil) // Should fail because LoadPathMetadataFile is not implemented // Test path that already ends with / - pathWithSlash := "test/dir/" + pathWithSlash := testDirPathSlash _ = fpkg.UpdateDirectoryMetadata(ctx, pathWithSlash, nil, nil, nil) // Should fail because LoadPathMetadataFile is not implemented } @@ -239,122 +246,53 @@ func TestPackage_ListDirectories_Success(t *testing.T) { } } -// TestPackage_ListDirectories_Empty tests ListDirectories with no directories. -func TestPackage_ListDirectories_Empty(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } +func runListDirectoriesWithEntries(t *testing.T, entries []*metadata.PathMetadataEntry, wantCount int, errMsg string) { + t.Helper() + pkg, fpkg := setupOpenFilePackageForPathMetadata(t) defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - // Create only files, no directories - file1 := createValidPathMetadataEntry("file1.txt", metadata.PathMetadataTypeFile) - fpkg.PathMetadataEntries = []*metadata.PathMetadataEntry{file1} - + fpkg.PathMetadataEntries = entries directories, err := fpkg.ListDirectories() if err != nil { - t.Errorf("ListDirectories should succeed, got error: %v", err) + t.Errorf("ListDirectories %s: %v", errMsg, err) } - if len(directories) != 0 { - t.Errorf("ListDirectories should return 0 directories, got %d", len(directories)) + if len(directories) != wantCount { + t.Errorf("ListDirectories want %d directories, got %d", wantCount, len(directories)) } } +// TestPackage_ListDirectories_Empty tests ListDirectories with no directories. +func TestPackage_ListDirectories_Empty(t *testing.T) { + file1 := createValidPathMetadataEntry("file1.txt", metadata.PathMetadataTypeFile) + runListDirectoriesWithEntries(t, []*metadata.PathMetadataEntry{file1}, 0, "should succeed") +} + // ============================================================================= // TEST: Context Cancellation // ============================================================================= // TestPackage_AddDirectoryMetadata_ContextCancelled tests AddDirectoryMetadata with cancelled context. func TestPackage_AddDirectoryMetadata_ContextCancelled(t *testing.T) { - ctx := testhelpers.CancelledContext() - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - err = fpkg.AddDirectoryMetadata(ctx, "test/dir", nil, nil, nil) - if err == nil { - t.Error("AddDirectoryMetadata() should fail with cancelled context") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeContext { - t.Errorf("Expected error type Context, got: %v", pkgErr.Type) - } + runContextCancelledTest(t, func(fpkg *filePackage, ctx context.Context) error { + return fpkg.AddDirectoryMetadata(ctx, testDirPath, nil, nil, nil) + }) } // TestPackage_RemoveDirectoryMetadata_ContextCancelled tests RemoveDirectoryMetadata with cancelled context. func TestPackage_RemoveDirectoryMetadata_ContextCancelled(t *testing.T) { - ctx := testhelpers.CancelledContext() - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - err = fpkg.RemoveDirectoryMetadata(ctx, "test/dir") - if err == nil { - t.Error("RemoveDirectoryMetadata() should fail with cancelled context") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeContext { - t.Errorf("Expected error type Context, got: %v", pkgErr.Type) - } + runContextCancelledTest(t, func(fpkg *filePackage, ctx context.Context) error { + return fpkg.RemoveDirectoryMetadata(ctx, testDirPath) + }) } // TestPackage_UpdateDirectoryMetadata_ContextCancelled tests UpdateDirectoryMetadata with cancelled context. func TestPackage_UpdateDirectoryMetadata_ContextCancelled(t *testing.T) { - ctx := testhelpers.CancelledContext() - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - err = fpkg.UpdateDirectoryMetadata(ctx, "test/dir", nil, nil, nil) - if err == nil { - t.Error("UpdateDirectoryMetadata() should fail with cancelled context") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeContext { - t.Errorf("Expected error type Context, got: %v", pkgErr.Type) - } + runContextCancelledTest(t, func(fpkg *filePackage, ctx context.Context) error { + return fpkg.UpdateDirectoryMetadata(ctx, testDirPath, nil, nil, nil) + }) } // TestPackage_ListDirectories_InMemoryOperation tests that ListDirectories is a pure in-memory operation. func TestPackage_ListDirectories_InMemoryOperation(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) dir1 := createValidPathMetadataEntry("dir1/", metadata.PathMetadataTypeDirectory) - fpkg.PathMetadataEntries = []*metadata.PathMetadataEntry{dir1} - - // ListDirectories is now a pure in-memory operation and does not require context - directories, err := fpkg.ListDirectories() - if err != nil { - t.Errorf("ListDirectories should succeed as in-memory operation, got error: %v", err) - } - if len(directories) != 1 { - t.Errorf("ListDirectories should return 1 directory, got %d", len(directories)) - } + runListDirectoriesWithEntries(t, []*metadata.PathMetadataEntry{dir1}, 1, "should succeed as in-memory operation") } diff --git a/api/go/novus_package/package_path_metadata_files.go b/api/go/novus_package/package_path_metadata_files.go index c462b242..87a832ed 100644 --- a/api/go/novus_package/package_path_metadata_files.go +++ b/api/go/novus_package/package_path_metadata_files.go @@ -95,7 +95,7 @@ func (p *filePackage) SavePathMetadataFile(ctx context.Context) error { specialFile.FileID = nextFileID // Sequential, unique FileID specialFile.Type = 65001 specialFile.Paths = []generics.PathEntry{ - {Path: "__NVPK_PATH_65001__.nvpkpath", PathLength: 27}, + {PathLength: uint16(len("/__NVPK_PATH_65001__.nvpkpath")), Path: "/__NVPK_PATH_65001__.nvpkpath"}, } specialFile.CompressionType = 0 // No compression (uncompressed) specialFile.EncryptionType = 0 // No encryption diff --git a/api/go/novus_package/package_path_metadata_files_test.go b/api/go/novus_package/package_path_metadata_files_test.go index df6be7f4..e26ae068 100644 --- a/api/go/novus_package/package_path_metadata_files_test.go +++ b/api/go/novus_package/package_path_metadata_files_test.go @@ -64,8 +64,8 @@ func TestPackage_SavePathMetadataFile_CreateNew(t *testing.T) { if len(specialFile.Paths) == 0 { t.Fatal("Special file should have path") } - if specialFile.Paths[0].Path != "__NVPK_PATH_65001__.nvpkpath" { - t.Errorf("Special file path = %s, want '__NVPK_PATH_65001__.nvpkpath'", specialFile.Paths[0].Path) + if specialFile.Paths[0].Path != "/__NVPK_PATH_65001__.nvpkpath" { + t.Errorf("Special file path = %s, want '/__NVPK_PATH_65001__.nvpkpath'", specialFile.Paths[0].Path) } // Verify compression type is uncompressed (per spec) if specialFile.CompressionType != 0 { @@ -107,7 +107,7 @@ func TestPackage_SavePathMetadataFile_UpdateExisting(t *testing.T) { existingFile.FileID = 1 // Sequential FileID existingFile.Type = 65001 existingFile.Paths = []generics.PathEntry{ - {Path: "__NVPK_PATH_65001__.nvpkpath", PathLength: 27}, + {Path: "/__NVPK_PATH_65001__.nvpkpath", PathLength: uint16(len("/__NVPK_PATH_65001__.nvpkpath"))}, } existingFile.Data = []byte("old data") fpkg.SpecialFiles[65001] = existingFile diff --git a/api/go/novus_package/package_path_metadata_helpers.go b/api/go/novus_package/package_path_metadata_helpers.go index e64b5fab..5a47000c 100644 --- a/api/go/novus_package/package_path_metadata_helpers.go +++ b/api/go/novus_package/package_path_metadata_helpers.go @@ -21,14 +21,14 @@ import ( // Returns: // - *metadata.FileEntry: The found FileEntry, or nil if not found // - error: *PackageError if file not found -func (p *filePackage) findFileEntryByPath(path string) (*metadata.FileEntry, error) { +func (p *filePackage) findFileEntryByPath(pathStr string) (*metadata.FileEntry, error) { for _, fe := range p.FileEntries { if fe == nil { continue } // Check if any of FileEntry's paths match the search path for _, pe := range fe.Paths { - if pe.Path == path { + if pe.Path == pathStr { return fe, nil } } @@ -40,7 +40,7 @@ func (p *filePackage) findFileEntryByPath(path string) (*metadata.FileEntry, err nil, pkgerrors.ValidationErrorContext{ Field: "FilePath", - Value: path, + Value: pathStr, Expected: "existing file path", }, ) @@ -54,9 +54,9 @@ func (p *filePackage) findFileEntryByPath(path string) (*metadata.FileEntry, err // Returns: // - *metadata.PathMetadataEntry: The found PathMetadataEntry, or nil if not found // - error: *PackageError if path metadata not found -func (p *filePackage) findPathMetadataByPath(path string) (*metadata.PathMetadataEntry, error) { +func (p *filePackage) findPathMetadataByPath(pathStr string) (*metadata.PathMetadataEntry, error) { for _, pme := range p.PathMetadataEntries { - if pme != nil && pme.Path.Path == path { + if pme != nil && pme.Path.Path == pathStr { return pme, nil } } @@ -67,7 +67,7 @@ func (p *filePackage) findPathMetadataByPath(path string) (*metadata.PathMetadat nil, pkgerrors.ValidationErrorContext{ Field: "Path", - Value: path, + Value: pathStr, Expected: "existing path metadata entry", }, ) @@ -113,7 +113,7 @@ func (p *filePackage) setParentPathAssociation(pme *metadata.PathMetadataEntry) // Strip trailing slash for directory paths before computing parent pathForDir := currentPath - if len(pathForDir) > 0 && pathForDir[len(pathForDir)-1] == '/' { + if pathForDir != "" && pathForDir[len(pathForDir)-1] == '/' { pathForDir = pathForDir[:len(pathForDir)-1] } diff --git a/api/go/novus_package/package_path_metadata_hierarchy_test.go b/api/go/novus_package/package_path_metadata_hierarchy_test.go index 16d4f28e..d24c8669 100644 --- a/api/go/novus_package/package_path_metadata_hierarchy_test.go +++ b/api/go/novus_package/package_path_metadata_hierarchy_test.go @@ -42,18 +42,9 @@ func TestPackage_GetPathInfo_Basic(t *testing.T) { // TestPackage_ListPaths_Basic tests basic ListPaths operation. func TestPackage_ListPaths_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } + pkg, fpkg := setupOpenFilePackageForPathMetadata(t) defer func() { _ = pkg.Close() }() - - // ListPaths should now succeed since LoadPathMetadataFile is implemented - fpkg := pkg.(*filePackage) - fpkg.isOpen = true - fpkg.SpecialFiles = make(map[uint16]*metadata.FileEntry) - fpkg.PathMetadataEntries = make([]*metadata.PathMetadataEntry, 0) - _, err = fpkg.ListPaths() + _, err := fpkg.ListPaths() if err != nil { t.Errorf("ListPaths failed: %v", err) } @@ -65,18 +56,9 @@ func TestPackage_ListPaths_Basic(t *testing.T) { // TestPackage_GetPathHierarchy_Basic tests basic GetPathHierarchy operation. func TestPackage_GetPathHierarchy_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } + pkg, fpkg := setupOpenFilePackageForPathMetadata(t) defer func() { _ = pkg.Close() }() - - // GetPathHierarchy should now succeed since LoadPathMetadataFile is implemented - fpkg := pkg.(*filePackage) - fpkg.isOpen = true - fpkg.SpecialFiles = make(map[uint16]*metadata.FileEntry) - fpkg.PathMetadataEntries = make([]*metadata.PathMetadataEntry, 0) - _, err = fpkg.GetPathHierarchy() + _, err := fpkg.GetPathHierarchy() if err != nil { t.Errorf("GetPathHierarchy failed: %v", err) } @@ -86,6 +68,8 @@ func TestPackage_GetPathHierarchy_Basic(t *testing.T) { // TEST: GetPathInfo with cached entries // ============================================================================= +const parentPathStr = "parent" + // TestPackage_GetPathInfo_NotFound tests GetPathInfo when path not found. func TestPackage_GetPathInfo_NotFound(t *testing.T) { pkg, err := NewPackage() @@ -142,7 +126,7 @@ func TestPackage_GetPathInfo_WithParent(t *testing.T) { defer func() { _ = pkg.Close() }() fpkg := pkg.(*filePackage) - parentPath := "parent" + parentPath := parentPathStr childPath := "parent/child" parent := createValidPathMetadataEntry(parentPath, metadata.PathMetadataTypeDirectory) child := createValidPathMetadataEntry(childPath, metadata.PathMetadataTypeFile) @@ -167,7 +151,7 @@ func TestPackage_GetPathInfo_WithSubDirs(t *testing.T) { defer func() { _ = pkg.Close() }() fpkg := pkg.(*filePackage) - parentPath := "parent" + parentPath := parentPathStr subDir1 := "parent/subdir1" subDir2 := "parent/subdir2" parent := createValidPathMetadataEntry(parentPath, metadata.PathMetadataTypeDirectory) @@ -237,26 +221,34 @@ func TestPackage_ListPaths_Success(t *testing.T) { } } -// TestPackage_ListPaths_Empty tests ListPaths with empty entries. -func TestPackage_ListPaths_Empty(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } +func runEmptyPathMetadataCall(t *testing.T, callListPaths bool) { + t.Helper() + pkg, fpkg := setupOpenFilePackageForPathMetadata(t) defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - fpkg.PathMetadataEntries = []*metadata.PathMetadataEntry{} - - pathInfos, err := fpkg.ListPaths() - if err != nil { - t.Errorf("ListPaths should succeed with empty entries, got error: %v", err) - } - if len(pathInfos) != 0 { - t.Errorf("ListPaths should return empty slice, got %d", len(pathInfos)) + if callListPaths { + pathInfos, err := fpkg.ListPaths() + if err != nil { + t.Errorf("ListPaths should succeed with empty entries, got error: %v", err) + } + if len(pathInfos) != 0 { + t.Errorf("ListPaths should return empty slice, got %d", len(pathInfos)) + } + } else { + hierarchy, err := fpkg.GetPathHierarchy() + if err != nil { + t.Errorf("GetPathHierarchy should succeed with empty entries, got error: %v", err) + } + if len(hierarchy) != 0 { + t.Errorf("GetPathHierarchy should return empty map, got %d entries", len(hierarchy)) + } } } +// TestPackage_ListPaths_Empty tests ListPaths with empty entries. +func TestPackage_ListPaths_Empty(t *testing.T) { + runEmptyPathMetadataCall(t, true) +} + // ============================================================================= // TEST: GetPathHierarchy with cached entries // ============================================================================= @@ -270,7 +262,7 @@ func TestPackage_GetPathHierarchy_Success(t *testing.T) { defer func() { _ = pkg.Close() }() fpkg := pkg.(*filePackage) - parentPath := "parent" + parentPath := parentPathStr child1 := "parent/child1" child2 := "parent/child2" rootPath := "root" @@ -297,22 +289,7 @@ func TestPackage_GetPathHierarchy_Success(t *testing.T) { // TestPackage_GetPathHierarchy_Empty tests GetPathHierarchy with empty entries. func TestPackage_GetPathHierarchy_Empty(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - fpkg.PathMetadataEntries = []*metadata.PathMetadataEntry{} - - hierarchy, err := fpkg.GetPathHierarchy() - if err != nil { - t.Errorf("GetPathHierarchy should succeed with empty entries, got error: %v", err) - } - if len(hierarchy) != 0 { - t.Errorf("GetPathHierarchy should return empty map, got %d entries", len(hierarchy)) - } + runEmptyPathMetadataCall(t, false) } // TestPackage_GetPathHierarchy_AllRoot tests GetPathHierarchy with all root paths. diff --git a/api/go/novus_package/package_path_metadata_test.go b/api/go/novus_package/package_path_metadata_test.go index 24dcb3ee..9e79188e 100644 --- a/api/go/novus_package/package_path_metadata_test.go +++ b/api/go/novus_package/package_path_metadata_test.go @@ -43,19 +43,9 @@ func TestPackage_GetPathMetadata_Basic(t *testing.T) { // TestPackage_GetPathMetadata_WithContext tests GetPathMetadata with context scenarios. func TestPackage_GetPathMetadata_WithContext(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - // Test with cancelled context - cancelledCtx := testhelpers.CancelledContext() - fpkg := pkg.(*filePackage) - _, err = fpkg.GetPathMetadata(cancelledCtx) - if err == nil { - t.Error("GetPathMetadata should fail with cancelled context") - } + runWithCancelledContext(t, func(fpkg *filePackage, ctx context.Context) (interface{}, error) { + return fpkg.GetPathMetadata(ctx) + }, "GetPathMetadata") } // ============================================================================= @@ -580,48 +570,15 @@ func TestPackage_UpdatePath_ContextCancelled(t *testing.T) { // TestPackage_ValidatePathMetadata_ContextCancelled tests ValidatePathMetadata with cancelled context. func TestPackage_ValidatePathMetadata_ContextCancelled(t *testing.T) { - ctx := testhelpers.CancelledContext() - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - err = fpkg.ValidatePathMetadata(ctx) - if err == nil { - t.Error("ValidatePathMetadata() should fail with cancelled context") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeContext { - t.Errorf("Expected error type Context, got: %v", pkgErr.Type) - } + runContextCancelledTest(t, func(fpkg *filePackage, ctx context.Context) error { + return fpkg.ValidatePathMetadata(ctx) + }) } // TestPackage_GetPathConflicts_ContextCancelled tests GetPathConflicts with cancelled context. func TestPackage_GetPathConflicts_ContextCancelled(t *testing.T) { - ctx := testhelpers.CancelledContext() - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - fpkg := pkg.(*filePackage) - _, err = fpkg.GetPathConflicts(ctx) - if err == nil { - t.Error("GetPathConflicts() should fail with cancelled context") - } - - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Fatalf("Expected PackageError, got: %T", err) - } - if pkgErr.Type != pkgerrors.ErrTypeContext { - t.Errorf("Expected error type Context, got: %v", pkgErr.Type) - } + runContextCancelledTest(t, func(fpkg *filePackage, ctx context.Context) error { + _, err := fpkg.GetPathConflicts(ctx) + return err + }) } diff --git a/api/go/novus_package/package_reader.go b/api/go/novus_package/package_reader.go index e7d4aac0..56e0c56f 100644 --- a/api/go/novus_package/package_reader.go +++ b/api/go/novus_package/package_reader.go @@ -1,4 +1,4 @@ -// This file implements PackageReader interface methods: ReadFile, ListFiles, +// This file implements package read operations: ReadFile, ListFiles, // GetMetadata, Validate, and GetInfo. It contains all read-only operations // for accessing package contents and metadata as specified in api_core.md. // This file should contain methods for reading files, listing package contents, @@ -65,7 +65,7 @@ import ( // fmt.Printf("Compressed Size: %d bytes\n", info.FilesCompressedSize) // fmt.Printf("Created: %v\n", info.Created) // -// Specification: api_core.md: 1.1.4 ListFiles Method Contract +// Specification: api_core.md: 1.2.5 Package.GetInfo Method func (p *filePackage) GetInfo() (*metadata.PackageInfo, error) { // GetInfo is an in-memory operation that is allowed after Close() as long as // metadata remains available, but it should not work after CloseWithCleanup(). @@ -95,7 +95,7 @@ func (p *filePackage) GetInfo() (*metadata.PackageInfo, error) { // Error Conditions: // - ErrTypeValidation: Package is closed or metadata not loaded // -// Specification: api_core.md: 1.1.6 GetMetadata Method Contract +// Specification: api_core.md: 1.2.6 Package.GetMetadata Method func (p *filePackage) GetMetadata() (*metadata.PackageMetadata, error) { // GetMetadata is an in-memory operation that is allowed after Close() as long // as metadata remains available, but it should not work for a package that has @@ -141,122 +141,90 @@ func (p *filePackage) GetMetadata() (*metadata.PackageMetadata, error) { // - ErrTypeValidation: Path is invalid or file not found // - ErrTypeIO: Failed to read file data // -// Specification: api_core.md: 1.1.3 ReadFile Method Contract +// Specification: api_core.md: 1.2.2 Package.ReadFile Method func (p *filePackage) ReadFile(ctx context.Context, path string) ([]byte, error) { - // Validate context - if err := internal.CheckContext(ctx, "ReadFile"); err != nil { - return nil, pkgerrors.WrapError(err, pkgerrors.ErrTypeContext, "error during ReadFile: context validation failed") + normalizedPath, fileEntry, err := p.readFileValidateAndResolve(ctx, path) + if err != nil { + return nil, err + } + _ = normalizedPath // used only for resolution; fileEntry is the resolved entry + if fileEntry.IsDataLoaded { + return fileEntry.Data, nil } + return p.readFileDataFromSource(ctx, fileEntry) +} - // Check if package is open +// readFileValidateAndResolve validates context, package state, path, and resolves the FileEntry. +func (p *filePackage) readFileValidateAndResolve(ctx context.Context, path string) (string, *metadata.FileEntry, error) { + if err := internal.CheckContext(ctx, "ReadFile"); err != nil { + return "", nil, pkgerrors.WrapError(err, pkgerrors.ErrTypeContext, "error during ReadFile: context validation failed") + } if !p.isOpen { - return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "package is not open", nil, struct{}{}) + return "", nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "package is not open", nil, struct{}{}) } - - // Validate path if err := internal.ValidatePackagePath(path); err != nil { - return nil, pkgerrors.WrapError(err, pkgerrors.ErrTypeValidation, "error during ReadFile: path validation failed") + return "", nil, pkgerrors.WrapError(err, pkgerrors.ErrTypeValidation, "error during ReadFile: path validation failed") } - - // Normalize path for comparison normalizedPath, err := internal.NormalizePackagePath(path) if err != nil { - return nil, pkgerrors.WrapError(err, pkgerrors.ErrTypeValidation, "error during ReadFile: path normalization failed") + return "", nil, pkgerrors.WrapError(err, pkgerrors.ErrTypeValidation, "error during ReadFile: path normalization failed") } - - // Find FileEntry by normalized path fileEntry, err := p.findFileEntryByPath(normalizedPath) if err != nil { - return nil, pkgerrors.WrapError(err, pkgerrors.ErrTypeValidation, "error during ReadFile: file not found") - } - - // Check if data is already loaded in memory (from StageFile or not-yet-written entries) - if fileEntry.IsDataLoaded { - // Return in-memory data directly (no need for decryption/decompression in baseline) - // This works even when fileHandle is nil (for newly written files) - return fileEntry.Data, nil + return "", nil, pkgerrors.WrapError(err, pkgerrors.ErrTypeValidation, "error during ReadFile: file not found") } + return normalizedPath, fileEntry, nil +} - // Check context cancellation before I/O +// readFileDataFromSource reads file data from the package file using SourceFile/SourceOffset. +func (p *filePackage) readFileDataFromSource(ctx context.Context, fileEntry *metadata.FileEntry) ([]byte, error) { select { case <-ctx.Done(): return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeContext, "context cancelled", ctx.Err(), struct{}{}) default: } - - // Use the stored SourceFile and SourceOffset to locate file data - // For opened packages, SourceFile points to the package file and SourceOffset is the file data offset if fileEntry.SourceFile == nil { return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "file source is not available", nil, pkgerrors.ValidationErrorContext{ - Field: "SourceFile", - Value: "nil", - Expected: "valid file handle", + Field: "SourceFile", Value: "nil", Expected: "valid file handle", }) } - if fileEntry.SourceOffset == 0 { return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "file source offset is not set", nil, pkgerrors.ValidationErrorContext{ - Field: "SourceOffset", - Value: 0, - Expected: "valid file offset", + Field: "SourceOffset", Value: 0, Expected: "valid file offset", }) } - - // Seek to file data using stored offset if _, err := fileEntry.SourceFile.Seek(fileEntry.SourceOffset, 0); err != nil { return nil, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to seek to file data", pkgerrors.ValidationErrorContext{ - Field: "SourceOffset", - Value: fileEntry.SourceOffset, - Expected: "seek successful", + Field: "SourceOffset", Value: fileEntry.SourceOffset, Expected: "seek successful", }) } - - // Check context cancellation during I/O select { case <-ctx.Done(): return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeContext, "context cancelled", ctx.Err(), struct{}{}) default: } - - // Read file data (stored size) data := make([]byte, fileEntry.StoredSize) n, err := fileEntry.SourceFile.Read(data) if err != nil && err != io.EOF { return nil, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read file data", pkgerrors.ValidationErrorContext{ - Field: "StoredSize", - Value: fileEntry.StoredSize, - Expected: "read successful", + Field: "StoredSize", Value: fileEntry.StoredSize, Expected: "read successful", }) } - if uint64(n) != fileEntry.StoredSize { return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "incomplete file data read", nil, pkgerrors.ValidationErrorContext{ - Field: "Data", - Value: n, - Expected: fmt.Sprintf("%d bytes", fileEntry.StoredSize), + Field: "Data", Value: n, Expected: fmt.Sprintf("%d bytes", fileEntry.StoredSize), }) } - - // TODO: Apply decryption if file is encrypted if fileEntry.EncryptionType != 0 { - // Decryption not yet implemented return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeUnsupported, "file decryption not yet implemented", nil, pkgerrors.ValidationErrorContext{ - Field: "EncryptionType", - Value: fileEntry.EncryptionType, - Expected: "decryption support", + Field: "EncryptionType", Value: fileEntry.EncryptionType, Expected: "decryption support", }) } - - // TODO: Apply decompression if file is compressed if fileEntry.CompressionType != 0 { - // Decompression not yet implemented return nil, pkgerrors.NewPackageError(pkgerrors.ErrTypeUnsupported, "file decompression not yet implemented", nil, pkgerrors.ValidationErrorContext{ - Field: "CompressionType", - Value: fileEntry.CompressionType, - Expected: "decompression support", + Field: "CompressionType", Value: fileEntry.CompressionType, Expected: "decompression support", }) } - return data, nil } @@ -267,7 +235,7 @@ func (p *filePackage) ReadFile(ctx context.Context, path string) ([]byte, error) // perform I/O, so it does not accept a context parameter. // // The results are stable across calls when package state is unchanged. -// Files removed from in-memory package (via UnstageFile) are excluded. +// Files removed from in-memory package (via RemoveFile) are excluded. // // Returns: // - []FileInfo: Sorted list of file information @@ -276,7 +244,9 @@ func (p *filePackage) ReadFile(ctx context.Context, path string) ([]byte, error) // Error Conditions: // - ErrTypeValidation: Package is closed or path normalization fails // -// Specification: api_core.md: 1.1.3 ReadFile Method Contract +// Specification: api_core.md: 1.2.3 Package.ListFiles Method +// +//nolint:gocognit // iteration and metadata branches func (p *filePackage) ListFiles() ([]FileInfo, error) { // CloseWithCleanup clears in-memory state. // ListFiles is an in-memory operation and is allowed after Close as long as @@ -296,6 +266,13 @@ func (p *filePackage) ListFiles() ([]FileInfo, error) { continue } + // Exclude internal special metadata files from user-visible listings. + if p.SpecialFiles != nil { + if _, ok := p.SpecialFiles[entry.Type]; ok { + continue + } + } + // Collect and convert all paths for this entry displayPaths := make([]string, 0, len(entry.Paths)) @@ -412,7 +389,7 @@ func (p *filePackage) ListFiles() ([]FileInfo, error) { // } // fmt.Println("Package is valid") // -// Specification: api_basic_operations.md: 9.1 Package Validation +// Specification: api_basic_operations.md: 15. Package.Validate Method func (p *filePackage) Validate(ctx context.Context) error { // Validate context if err := internal.CheckContext(ctx, "Validate"); err != nil { @@ -429,13 +406,13 @@ func (p *filePackage) Validate(ctx context.Context) error { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "package header is nil", nil, struct{}{}) } - if err := p.header.Validate(); err != nil { + if err := validatePackageHeader(p.header); err != nil { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "invalid package header", err, struct{}{}) } // Validate file index if present if p.index != nil { - if err := p.index.Validate(); err != nil { + if err := validateFileIndex(p.index); err != nil { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "invalid file index", err, struct{}{}) } } diff --git a/api/go/novus_package/package_reader_additional_test.go b/api/go/novus_package/package_reader_additional_test.go index 38231f7a..bf64c877 100644 --- a/api/go/novus_package/package_reader_additional_test.go +++ b/api/go/novus_package/package_reader_additional_test.go @@ -3,6 +3,7 @@ package novus_package import ( + "bytes" "context" "os" "path/filepath" @@ -37,7 +38,7 @@ func TestReadFile_AfterAddFromMemory(t *testing.T) { } // Verify data matches - if string(readData) != string(originalData) { + if !bytes.Equal(readData, originalData) { t.Errorf("ReadFile data = %q, want %q", string(readData), string(originalData)) } } @@ -66,8 +67,7 @@ func TestReadFile_AfterWrite(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write note: %v (may be expected for incomplete implementation)", err) - return + t.Fatalf("Write failed: %v", err) } // Close and reopen @@ -78,8 +78,7 @@ func TestReadFile_AfterWrite(t *testing.T) { // Open for reading pkg2, err := OpenPackage(ctx, tmpFile) if err != nil { - t.Logf("OpenPackage note: %v (may require complete Write implementation)", err) - return + t.Fatalf("OpenPackage failed: %v", err) } defer func() { _ = pkg2.Close() }() @@ -89,7 +88,7 @@ func TestReadFile_AfterWrite(t *testing.T) { t.Fatalf("ReadFile failed: %v", err) } - if string(readData) != string(originalData) { + if !bytes.Equal(readData, originalData) { t.Errorf("ReadFile data = %q, want %q", string(readData), string(originalData)) } } @@ -176,42 +175,12 @@ func TestReadFile_ContextCancelled(t *testing.T) { // TestGetMetadata_Basic tests GetMetadata method. func TestGetMetadata_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - metadata, err := pkg.GetMetadata() - if err != nil { - t.Logf("GetMetadata note: %v (may require initialization)", err) - return - } - - if metadata == nil { - t.Error("GetMetadata returned nil") - } + runGetMetadataBasic(t) } // TestGetInfo_Basic tests GetInfo method. func TestGetInfo_Basic(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - info, err := pkg.GetInfo() - if err != nil { - t.Fatalf("GetInfo failed: %v", err) - } - - if info == nil { - t.Fatal("GetInfo returned nil") - } - - // Verify basic info fields - if info.FormatVersion == 0 { - t.Error("Info.FormatVersion = 0, expected non-zero") - } + runGetInfoBasic(t, true) } // TestValidate_EmptyPackage tests validating an empty package. @@ -331,7 +300,7 @@ func TestWrite_EmptyPackage(t *testing.T) { err = pkg.Write(ctx) if err != nil { - t.Logf("Write empty package: %v (may be expected)", err) + t.Fatalf("Write empty package failed: %v", err) } } @@ -344,15 +313,7 @@ func TestWrite_WithFiles(t *testing.T) { ctx := context.Background() - // Add files - for i := 0; i < 3; i++ { - path := "file" + string(rune('0'+i)) + ".txt" - data := []byte("Content for file " + string(rune('0'+i))) - _, err := pkg.AddFileFromMemory(ctx, path, data, nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - } + addThreeFilesFromMemory(t, ctx, pkg, "", "Content for file ") tmpFile := filepath.Join(t.TempDir(), "withfiles.pkg") if err := pkg.SetTargetPath(ctx, tmpFile); err != nil { @@ -361,8 +322,7 @@ func TestWrite_WithFiles(t *testing.T) { err = pkg.Write(ctx) if err != nil { - t.Logf("Write with files: %v (implementation may be incomplete)", err) - return + t.Fatalf("Write with files failed: %v", err) } // Verify file was created @@ -393,7 +353,7 @@ func TestSafeWrite_NoOverwrite(t *testing.T) { } // Create the file first - if err := os.WriteFile(tmpFile, []byte("existing"), 0644); err != nil { + if err := os.WriteFile(tmpFile, []byte("existing"), 0o644); err != nil { t.Fatalf("Failed to create existing file: %v", err) } @@ -426,13 +386,13 @@ func TestSafeWrite_WithOverwrite(t *testing.T) { } // Create the file first - if err := os.WriteFile(tmpFile, []byte("old data"), 0644); err != nil { + if err := os.WriteFile(tmpFile, []byte("old data"), 0o644); err != nil { t.Fatalf("Failed to create existing file: %v", err) } // SafeWrite with overwrite (should succeed) err = pkg.SafeWrite(ctx, true) if err != nil { - t.Logf("SafeWrite with overwrite: %v (implementation may be incomplete)", err) + t.Fatalf("SafeWrite with overwrite failed: %v", err) } } diff --git a/api/go/novus_package/package_reader_coverage_test.go b/api/go/novus_package/package_reader_coverage_test.go index 65f680bd..49427471 100644 --- a/api/go/novus_package/package_reader_coverage_test.go +++ b/api/go/novus_package/package_reader_coverage_test.go @@ -6,39 +6,18 @@ package novus_package import ( + "bytes" "context" "path/filepath" "testing" ) func TestPackage_ReadFile_NotFound(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Try to read non-existent file - _, err = pkg.ReadFile(ctx, "/nonexistent.txt") - if err == nil { - t.Error("ReadFile with non-existent path should fail") - } + runReadFileExpectFail(t, "/nonexistent.txt") } func TestPackage_ReadFile_EmptyPath(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Try to read with empty path - _, err = pkg.ReadFile(ctx, "") - if err == nil { - t.Error("ReadFile with empty path should fail") - } + runReadFileExpectFail(t, "") } func TestPackage_ReadFile_NotOpen(t *testing.T) { @@ -61,37 +40,34 @@ func TestPackage_ReadFile_NotOpen(t *testing.T) { } } -func TestPackage_ReadFile_FromDisk(t *testing.T) { +func runReadFileFromDisk(t *testing.T, testContent []byte) { + t.Helper() pkg, err := NewPackage() if err != nil { t.Fatalf("NewPackage failed: %v", err) } - ctx := context.Background() - - // Create package to open it (required for ReadFile) tmpPkg := filepath.Join(t.TempDir(), "test.pkg") if err := pkg.Create(ctx, tmpPkg); err != nil { t.Fatalf("Create failed: %v", err) } - - testContent := []byte("content from disk") entry, err := pkg.AddFileFromMemory(ctx, "/test.txt", testContent, nil) if err != nil { t.Fatalf("AddFileFromMemory failed: %v", err) } - - // Read file data, err := pkg.ReadFile(ctx, entry.Paths[0].Path) if err != nil { t.Fatalf("ReadFile failed: %v", err) } - - if string(data) != string(testContent) { + if !bytes.Equal(data, testContent) { t.Errorf("ReadFile content mismatch: got %q, want %q", string(data), string(testContent)) } } +func TestPackage_ReadFile_FromDisk(t *testing.T) { + runReadFileFromDisk(t, []byte("content from disk")) +} + func TestPackage_ListFiles_Empty(t *testing.T) { pkg, err := NewPackage() if err != nil { @@ -201,15 +177,13 @@ func TestPackage_ListFiles_RoundTrip(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("Write failed: %v", err) } // Reopen and list again pkg2, err := OpenPackage(ctx, tmpPkg) if err != nil { - t.Logf("OpenPackage failed: %v (may require complete Write implementation)", err) - return + t.Fatalf("OpenPackage failed: %v", err) } defer func() { _ = pkg2.Close() }() @@ -265,15 +239,7 @@ func TestPackage_GetInfo_Coverage(t *testing.T) { t.Fatal("GetInfo returned nil") } - // Add files (use different content to avoid deduplication) - for i := 0; i < 3; i++ { - path := "/file" + string(rune('0'+i)) + ".txt" - content := []byte("content " + string(rune('0'+i))) - _, err := pkg.AddFileFromMemory(ctx, path, content, nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - } + addThreeFilesFromMemory(t, ctx, pkg, "/", "content ") // Get info after adding files info2, err := pkg.GetInfo() @@ -312,20 +278,7 @@ func TestPackage_GetMetadata_Coverage(t *testing.T) { } func TestPackage_GetMetadata_NoMetadataLoaded(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - // Get metadata from new package (may require package to be opened/loaded) - metadata, err := pkg.GetMetadata() - if err != nil { - t.Logf("GetMetadata failed: %v (may require package to be opened/loaded)", err) - return - } - if metadata == nil { - t.Fatal("GetMetadata returned nil") - } + runGetMetadataBasic(t) } func TestPackage_OpenPackage_FileNotFound(t *testing.T) { diff --git a/api/go/novus_package/package_reader_test.go b/api/go/novus_package/package_reader_test.go index d9c7c8bc..cd18c903 100644 --- a/api/go/novus_package/package_reader_test.go +++ b/api/go/novus_package/package_reader_test.go @@ -5,6 +5,7 @@ package novus_package import ( + "bytes" "context" "path/filepath" "testing" @@ -17,20 +18,28 @@ import ( "github.com/novus-engine/novuspack/api/go/pkgerrors" ) -// TestPackage_GetInfo_Basic tests basic package information retrieval. -func TestPackage_GetInfo_Basic(t *testing.T) { +const testPkgPath = "/test/package.nvpk" + +func newFilePackageWithState(info *metadata.PackageInfo, filePath string, fileEntries []*metadata.FileEntry) *filePackage { + pkg, _ := NewPackage() + fpkg := pkg.(*filePackage) + fpkg.Info = info + fpkg.FilePath = filePath + fpkg.FileEntries = fileEntries + return fpkg +} + +func runOpenPackageGetInfoAssert(t *testing.T) { + t.Helper() ctx := context.Background() tmpDir := t.TempDir() pkgPath := filepath.Join(tmpDir, "test.nvpk") testutil.CreateTestPackageFile(t, pkgPath) - pkg, err := OpenPackage(ctx, pkgPath) if err != nil { t.Fatalf("OpenPackage() failed: %v", err) } defer func() { _ = pkg.Close() }() - - // Test: GetInfo should return package information info, err := pkg.GetInfo() if err != nil { t.Errorf("GetInfo() failed: %v", err) @@ -40,27 +49,14 @@ func TestPackage_GetInfo_Basic(t *testing.T) { } } +// TestPackage_GetInfo_Basic tests basic package information retrieval. +func TestPackage_GetInfo_Basic(t *testing.T) { + runOpenPackageGetInfoAssert(t) +} + // TestPackage_GetInfo_WithContext tests GetInfo with context scenarios. func TestPackage_GetInfo_WithContext(t *testing.T) { - ctx := context.Background() - tmpDir := t.TempDir() - pkgPath := filepath.Join(tmpDir, "test.nvpk") - testutil.CreateTestPackageFile(t, pkgPath) - - pkg, err := OpenPackage(ctx, pkgPath) - if err != nil { - t.Fatalf("OpenPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - // GetInfo is a pure in-memory operation and does not accept context - info, err := pkg.GetInfo() - if err != nil { - t.Errorf("GetInfo() failed: %v", err) - } - if info == nil { - t.Error("GetInfo() returned nil") - } + runOpenPackageGetInfoAssert(t) } // TestPackage_GetInfo_AfterCreate tests GetInfo after Create and Open. @@ -101,53 +97,12 @@ func TestPackage_GetInfo_AfterCreate(t *testing.T) { // TestPackage_GetInfo_OnNew tests GetInfo on newly created package. func TestPackage_GetInfo_OnNew(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage() failed: %v", err) - } - - // GetInfo on new package should succeed (metadata is initialized) - info, err := pkg.GetInfo() - if err != nil { - t.Fatalf("GetInfo() failed on newly created package: %v", err) - } - - if info == nil { - t.Fatal("GetInfo() returned nil") - } - - // Verify format version is set - if info.FormatVersion == 0 { - t.Error("FormatVersion should be non-zero for newly created package") - } + runGetInfoBasic(t, true) } // TestPackage_GetInfo_OnClosedPackage tests GetInfo after Close. func TestPackage_GetInfo_OnClosedPackage(t *testing.T) { - ctx := context.Background() - // Open a package to ensure metadata is loaded, then Close it. - tmpDir := t.TempDir() - pkgPath := filepath.Join(tmpDir, "test.nvpk") - testutil.CreateTestPackageFile(t, pkgPath) - - pkg, err := OpenPackage(ctx, pkgPath) - if err != nil { - t.Fatalf("OpenPackage() failed: %v", err) - } - - // Close the package (in-memory metadata should remain available) - if err := pkg.Close(); err != nil { - t.Fatalf("Close() failed: %v", err) - } - - // GetInfo on closed package should succeed if metadata is still in memory - info, err := pkg.GetInfo() - if err != nil { - t.Fatalf("GetInfo() should succeed on a closed package with cached metadata, got error: %v", err) - } - if info == nil { - t.Fatal("GetInfo() returned nil info") - } + runAssertGetInfoOnClosed(t) } // TestPackage_GetInfo_WithCancelledContext tests GetInfo (no longer uses context). @@ -257,88 +212,19 @@ func TestPackage_ReadFile(t *testing.T) { // TestPackage_ListFiles tests the ListFiles method. func TestPackage_ListFiles(t *testing.T) { - ctx := context.Background() - tmpDir := t.TempDir() - pkgPath := filepath.Join(tmpDir, "test.nvpk") - testutil.CreateTestPackageFile(t, pkgPath) - - pkg, err := OpenPackage(ctx, pkgPath) - if err != nil { - t.Fatalf("OpenPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - // ListFiles on an empty package should return empty list - files, err := pkg.ListFiles() - if err != nil { - t.Fatalf("ListFiles() failed: %v", err) - } - if files == nil { - t.Fatal("ListFiles() should not return nil") - } - if len(files) != 0 { - t.Errorf("ListFiles() on empty package should return empty list, got %d files", len(files)) - } + runOpenPackageListFilesExpectEmpty(t) } // TestPackage_ListFiles_WithFiles tests ListFiles with files in package. // Note: This test is simplified to test the basic functionality. // Full file entry loading is tested in OpenPackage tests. func TestPackage_ListFiles_WithFiles(t *testing.T) { - ctx := context.Background() - tmpDir := t.TempDir() - pkgPath := filepath.Join(tmpDir, "test.nvpk") - - // Create a minimal package file - testutil.CreateTestPackageFile(t, pkgPath) - - // Open package - pkg, err := OpenPackage(ctx, pkgPath) - if err != nil { - t.Fatalf("OpenPackage() failed: %v", err) - } - defer func() { _ = pkg.Close() }() - - // ListFiles should return empty list for empty package - files, err := pkg.ListFiles() - if err != nil { - t.Fatalf("ListFiles() failed: %v", err) - } - if files == nil { - t.Fatal("ListFiles() should not return nil") - } - // Empty package should return empty list - if len(files) != 0 { - t.Errorf("ListFiles() on empty package should return empty list, got %d files", len(files)) - } + runOpenPackageListFilesExpectEmpty(t) } // TestPackage_ListFiles_ClosedPackage tests ListFiles on closed package. func TestPackage_ListFiles_ClosedPackage(t *testing.T) { - ctx := context.Background() - // Open a package to ensure metadata is loaded, then Close it. - tmpDir := t.TempDir() - pkgPath := filepath.Join(tmpDir, "test.nvpk") - testutil.CreateTestPackageFile(t, pkgPath) - - pkg, err := OpenPackage(ctx, pkgPath) - if err != nil { - t.Fatalf("OpenPackage() failed: %v", err) - } - - // Close the package (in-memory metadata should remain available) - if err := pkg.Close(); err != nil { - t.Fatalf("Close() failed: %v", err) - } - - // ListFiles on closed package should succeed if metadata is still in memory - files, err := pkg.ListFiles() - if err != nil { - t.Fatalf("ListFiles() should succeed on a closed package with cached metadata, got error: %v", err) - } - if files == nil { - t.Fatal("ListFiles() should not return nil") - } + runAssertListFilesOnClosed(t) } // TestPackage_ListFiles_StableSorting tests that ListFiles returns stable results. @@ -381,6 +267,8 @@ func TestPackage_ListFiles_StableSorting(t *testing.T) { } // TestPackage_ListFiles_FileInfoFields tests that FileInfo contains all required fields. +// +//nolint:gocognit,gocyclo // table-driven file-info cases func TestPackage_ListFiles_FileInfoFields(t *testing.T) { // Setup: Create a new package in memory pkg, err := NewPackage() @@ -391,7 +279,7 @@ func TestPackage_ListFiles_FileInfoFields(t *testing.T) { fpkg := pkg.(*filePackage) fpkg.isOpen = true // Mark package as open - fpkg.FilePath = "/test/package.nvpk" // Set package path + fpkg.FilePath = testPkgPath // Set package path fpkg.Info = metadata.NewPackageInfo() // Initialize Info so metadata is considered loaded fpkg.FileEntries = []*metadata.FileEntry{} // Initialize FileEntries slice fpkg.PathMetadataEntries = []*metadata.PathMetadataEntry{} // Initialize PathMetadataEntries @@ -636,6 +524,8 @@ func TestPackage_ListFiles_FileInfoFields_Uncompressed(t *testing.T) { } // TestPackage_ListFiles_EdgeCases tests edge cases in ListFiles for coverage. +// +//nolint:gocognit,gocyclo // table-driven edge cases func TestPackage_ListFiles_EdgeCases(t *testing.T) { // Setup: Create a new package in memory pkg, err := NewPackage() @@ -789,60 +679,22 @@ func TestPackage_ListFiles_EdgeCases(t *testing.T) { } // TestPackage_ListFiles_NoMetadataLoaded tests ListFiles when metadata is not loaded. +// +//nolint:gocognit // table-driven no-metadata cases func TestPackage_ListFiles_NoMetadataLoaded(t *testing.T) { tests := []struct { name string setupPackage func() *filePackage wantErr bool }{ - { - name: "Info is nil", - setupPackage: func() *filePackage { - pkg, _ := NewPackage() - fpkg := pkg.(*filePackage) - fpkg.Info = nil - fpkg.FilePath = "/test/package.nvpk" - fpkg.FileEntries = []*metadata.FileEntry{} - return fpkg - }, - wantErr: true, - }, - { - name: "FilePath is empty", - setupPackage: func() *filePackage { - pkg, _ := NewPackage() - fpkg := pkg.(*filePackage) - fpkg.Info = metadata.NewPackageInfo() - fpkg.FilePath = "" - fpkg.FileEntries = []*metadata.FileEntry{} - return fpkg - }, - wantErr: false, // Changed: FilePath can be empty for newly created packages - }, - { - name: "FileEntries is nil", - setupPackage: func() *filePackage { - pkg, _ := NewPackage() - fpkg := pkg.(*filePackage) - fpkg.Info = metadata.NewPackageInfo() - fpkg.FilePath = "/test/package.nvpk" - fpkg.FileEntries = nil - return fpkg - }, - wantErr: true, - }, - { - name: "All fields valid", - setupPackage: func() *filePackage { - pkg, _ := NewPackage() - fpkg := pkg.(*filePackage) - fpkg.Info = metadata.NewPackageInfo() - fpkg.FilePath = "/sp/package.nvpk" - fpkg.FileEntries = []*metadata.FileEntry{} - return fpkg - }, - wantErr: false, - }, + {"Info is nil", func() *filePackage { return newFilePackageWithState(nil, testPkgPath, []*metadata.FileEntry{}) }, true}, + {"FilePath is empty", func() *filePackage { + return newFilePackageWithState(metadata.NewPackageInfo(), "", []*metadata.FileEntry{}) + }, false}, + {"FileEntries is nil", func() *filePackage { return newFilePackageWithState(metadata.NewPackageInfo(), testPkgPath, nil) }, true}, + {"All fields valid", func() *filePackage { + return newFilePackageWithState(metadata.NewPackageInfo(), "/sp/package.nvpk", []*metadata.FileEntry{}) + }, false}, } for _, tt := range tests { @@ -1257,7 +1109,7 @@ func TestPackage_ReadFile_SourceFileNil(t *testing.T) { t.Fatalf("ReadFile() failed: %v", err) } - if string(data) != string(testContent) { + if !bytes.Equal(data, testContent) { t.Errorf("ReadFile() content mismatch: got %q, want %q", string(data), string(testContent)) } } @@ -1298,7 +1150,7 @@ func TestPackage_ReadFile_SourceOffsetZero(t *testing.T) { t.Fatalf("ReadFile() failed: %v", err) } - if string(data) != string(testContent) { + if !bytes.Equal(data, testContent) { t.Errorf("ReadFile() content mismatch: got %q, want %q", string(data), string(testContent)) } } diff --git a/api/go/novus_package/package_session.go b/api/go/novus_package/package_session.go index 341a818a..80e8cb36 100644 --- a/api/go/novus_package/package_session.go +++ b/api/go/novus_package/package_session.go @@ -2,7 +2,7 @@ // It provides methods for setting, getting, clearing, and checking the session base path, // which is used for automatic path derivation when adding files from absolute filesystem paths. // -// Specification: api_basic_operations.md: 1. Context Integration +// Specification: api_basic_operations.md: 19 Package Session Base Management package novus_package @@ -35,7 +35,7 @@ import ( // return err // } // -// Specification: api_basic_operations.md: 1. Context Integration +// Specification: api_basic_operations.md: 19.4 Package.SetSessionBase Method func (p *filePackage) SetSessionBase(basePath string) error { // Validate that basePath is not empty if basePath == "" { @@ -84,7 +84,7 @@ func (p *filePackage) SetSessionBase(basePath string) error { // fmt.Println("No session base set") // } // -// Specification: api_basic_operations.md: 1. Context Integration +// Specification: api_basic_operations.md: 19.5 Package.GetSessionBase Method func (p *filePackage) GetSessionBase() string { return p.sessionBase } @@ -99,7 +99,7 @@ func (p *filePackage) GetSessionBase() string { // pkg.ClearSessionBase() // // Next absolute file path will establish new session base // -// Specification: api_basic_operations.md: 9.6 Session Base Management +// Specification: api_basic_operations.md: 19.6 Package.ClearSessionBase Method func (p *filePackage) ClearSessionBase() { p.sessionBase = "" } @@ -115,7 +115,7 @@ func (p *filePackage) ClearSessionBase() { // fmt.Printf("Session base: %s\n", pkg.GetSessionBase()) // } // -// Specification: api_basic_operations.md: 9.6 Session Base Management +// Specification: api_basic_operations.md: 19.7 Package.HasSessionBase Method func (p *filePackage) HasSessionBase() bool { return p.sessionBase != "" } diff --git a/api/go/novus_package/package_session_test.go b/api/go/novus_package/package_session_test.go index 7c1ee4af..a145cb6a 100644 --- a/api/go/novus_package/package_session_test.go +++ b/api/go/novus_package/package_session_test.go @@ -6,6 +6,8 @@ import ( "github.com/novus-engine/novuspack/api/go/pkgerrors" ) +const sessionBasePath = "/home/user/project" + // TestSetSessionBase_Success tests successful session base setting. func TestSetSessionBase_Success(t *testing.T) { pkg, err := NewPackage() @@ -19,7 +21,7 @@ func TestSetSessionBase_Success(t *testing.T) { }{ { name: "Unix absolute path", - basePath: "/home/user/project", + basePath: sessionBasePath, }, // Note: Windows path testing skipped - filepath.IsAbs is platform-specific // On Linux, Windows paths like "C:\\" are not considered absolute @@ -122,7 +124,7 @@ func TestGetSessionBase(t *testing.T) { } // Set a base path - basePath := "/home/user/project" + basePath := sessionBasePath if err := pkg.SetSessionBase(basePath); err != nil { t.Fatalf("SetSessionBase failed: %v", err) } @@ -142,7 +144,7 @@ func TestClearSessionBase(t *testing.T) { } // Set a session base - basePath := "/home/user/project" + basePath := sessionBasePath if err := pkg.SetSessionBase(basePath); err != nil { t.Fatalf("SetSessionBase failed: %v", err) } @@ -178,7 +180,7 @@ func TestHasSessionBase(t *testing.T) { } // Set a session base - basePath := "/home/user/project" + basePath := sessionBasePath if err := pkg.SetSessionBase(basePath); err != nil { t.Fatalf("SetSessionBase failed: %v", err) } diff --git a/api/go/novus_package/package_target_path_test.go b/api/go/novus_package/package_target_path_test.go index 28a1034b..9c1e1588 100644 --- a/api/go/novus_package/package_target_path_test.go +++ b/api/go/novus_package/package_target_path_test.go @@ -34,6 +34,8 @@ func TestSetTargetPath_Success(t *testing.T) { } // TestSetTargetPath_ErrorCases tests error conditions for SetTargetPath. +// +//nolint:gocognit // table-driven error cases func TestSetTargetPath_ErrorCases(t *testing.T) { ctx := context.Background() @@ -65,7 +67,7 @@ func TestSetTargetPath_ErrorCases(t *testing.T) { // Create a read-only directory tmpDir := t.TempDir() readOnlyDir := filepath.Join(tmpDir, "readonly") - if err := os.Mkdir(readOnlyDir, 0444); err != nil { + if err := os.Mkdir(readOnlyDir, 0o444); err != nil { t.Fatalf("Failed to create read-only directory: %v", err) } return filepath.Join(readOnlyDir, "file.nvpk") @@ -183,7 +185,7 @@ func TestSetTargetPath_PathCleaning(t *testing.T) { expectedClean := filepath.Join(tmpDir, "output.nvpk") // Create subdir so the parent validation works - if err := os.Mkdir(filepath.Join(tmpDir, "subdir"), 0755); err != nil { + if err := os.Mkdir(filepath.Join(tmpDir, "subdir"), 0o755); err != nil { t.Fatalf("Failed to create subdir: %v", err) } diff --git a/api/go/novus_package/package_test.go b/api/go/novus_package/package_test.go index 629ad290..0b98ab90 100644 --- a/api/go/novus_package/package_test.go +++ b/api/go/novus_package/package_test.go @@ -7,8 +7,10 @@ package novus_package import ( "context" "os" + "path/filepath" "testing" + "github.com/novus-engine/novuspack/api/go/fileformat/testutil" "github.com/novus-engine/novuspack/api/go/internal" "github.com/novus-engine/novuspack/api/go/internal/testhelpers" "github.com/novus-engine/novuspack/api/go/pkgerrors" @@ -30,6 +32,306 @@ func asPackageError(err error, target *pkgerrors.PackageError) bool { return false } +// runContextCancelledTest creates a package, calls the given method with a cancelled context, +// and asserts the error is a PackageError with ErrTypeContext. +func runContextCancelledTest(t *testing.T, call func(*filePackage, context.Context) error) { + t.Helper() + cancelledCtx := testhelpers.CancelledContext() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage() failed: %v", err) + } + defer func() { _ = pkg.Close() }() + fpkg := pkg.(*filePackage) + err = call(fpkg, cancelledCtx) + if err == nil { + t.Fatal("expected failure with cancelled context") + } + pkgErr := &pkgerrors.PackageError{} + if !asPackageError(err, pkgErr) { + t.Fatalf("Expected PackageError, got: %T", err) + } + if pkgErr.Type != pkgerrors.ErrTypeContext { + t.Errorf("Error type = %v, want ErrTypeContext", pkgErr.Type) + } +} + +func runReadFileExpectFail(t *testing.T, path string) { + t.Helper() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + ctx := context.Background() + _, err = pkg.ReadFile(ctx, path) + if err == nil { + t.Error("ReadFile should fail") + } +} + +func runGetMetadataBasic(t *testing.T) { + t.Helper() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + metadata, err := pkg.GetMetadata() + if err != nil { + t.Logf("GetMetadata note: %v (may require initialization)", err) + return + } + if metadata == nil { + t.Error("GetMetadata returned nil") + } +} + +// addThreeFilesFromMemory adds three files (file0.txt, file1.txt, file2.txt) via AddFileFromMemory. +func addThreeFilesFromMemory(t *testing.T, ctx context.Context, pkg Package, pathPrefix, contentPrefix string) { + t.Helper() + for i := 0; i < 3; i++ { + path := pathPrefix + "file" + string(rune('0'+i)) + ".txt" + data := []byte(contentPrefix + string(rune('0'+i))) + _, err := pkg.AddFileFromMemory(ctx, path, data, nil) + if err != nil { + t.Fatalf("AddFileFromMemory failed: %v", err) + } + } +} + +func runGetInfoBasic(t *testing.T, requireFormatVersion bool) { + t.Helper() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + info, err := pkg.GetInfo() + if err != nil { + t.Fatalf("GetInfo failed: %v", err) + } + if info == nil { + t.Fatal("GetInfo returned nil") + } + if requireFormatVersion && info.FormatVersion == 0 { + t.Error("Info.FormatVersion = 0, expected non-zero") + } +} + +func runOpenPackageThenCloseThen(t *testing.T, fn func(t *testing.T, pkg Package)) { + t.Helper() + ctx := context.Background() + tmpDir := t.TempDir() + pkgPath := filepath.Join(tmpDir, "test.nvpk") + testutil.CreateTestPackageFile(t, pkgPath) + pkg, err := OpenPackage(ctx, pkgPath) + if err != nil { + t.Fatalf("OpenPackage() failed: %v", err) + } + if err := pkg.Close(); err != nil { + t.Fatalf("Close() failed: %v", err) + } + fn(t, pkg) +} + +func runOpenPackageThenCloseThenSucceed(t *testing.T, methodName string, fn func(Package) (interface{}, error)) { + t.Helper() + runOpenPackageThenCloseThen(t, func(t *testing.T, pkg Package) { + got, err := fn(pkg) + if err != nil { + t.Fatalf("%s() should succeed on a closed package with cached metadata, got error: %v", methodName, err) + } + if got == nil { + t.Fatalf("%s() returned nil", methodName) + } + }) +} + +func runAssertGetInfoOnClosed(t *testing.T) { + t.Helper() + runOpenPackageThenCloseThenSucceed(t, "GetInfo", func(pkg Package) (interface{}, error) { return pkg.GetInfo() }) +} + +func runAssertListFilesOnClosed(t *testing.T) { + t.Helper() + runOpenPackageThenCloseThenSucceed(t, "ListFiles", func(pkg Package) (interface{}, error) { return pkg.ListFiles() }) +} + +func runOpenPackageListFilesExpectEmpty(t *testing.T) { + t.Helper() + ctx := context.Background() + tmpDir := t.TempDir() + pkgPath := filepath.Join(tmpDir, "test.nvpk") + testutil.CreateTestPackageFile(t, pkgPath) + pkg, err := OpenPackage(ctx, pkgPath) + if err != nil { + t.Fatalf("OpenPackage() failed: %v", err) + } + defer func() { _ = pkg.Close() }() + files, err := pkg.ListFiles() + if err != nil { + t.Fatalf("ListFiles() failed: %v", err) + } + if files == nil { + t.Fatal("ListFiles() should not return nil") + } + if len(files) != 0 { + t.Errorf("ListFiles() on empty package should return empty list, got %d files", len(files)) + } +} + +func runWriteWithContent(t *testing.T, content []byte, verifyFile bool) { + t.Helper() + ctx := context.Background() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + _, err = pkg.AddFileFromMemory(ctx, "/test.txt", content, nil) + if err != nil { + t.Fatalf("AddFileFromMemory failed: %v", err) + } + tmpPkg := filepath.Join(t.TempDir(), "test.pkg") + if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { + t.Fatalf("SetTargetPath failed: %v", err) + } + if err := pkg.Write(ctx); err != nil { + t.Fatalf("Write failed: %v", err) + } + if verifyFile { + if _, err := os.Stat(tmpPkg); os.IsNotExist(err) { + t.Error("Write did not create package file") + } + } +} + +func runWriteEmptyPackage(t *testing.T) { + t.Helper() + ctx := context.Background() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + tmpPkg := filepath.Join(t.TempDir(), "empty.pkg") + if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { + t.Fatalf("SetTargetPath failed: %v", err) + } + if err := pkg.Write(ctx); err != nil { + t.Fatalf("Write failed: %v", err) + } + + if _, err := os.Stat(tmpPkg); os.IsNotExist(err) { + t.Error("Write did not create package file") + } +} + +func runWriteContextCancelled(t *testing.T) { + t.Helper() + ctx := context.Background() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) + if err != nil { + t.Fatalf("AddFileFromMemory failed: %v", err) + } + tmpPkg := filepath.Join(t.TempDir(), "test.pkg") + if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { + t.Fatalf("SetTargetPath failed: %v", err) + } + cancelledCtx, cancel := context.WithCancel(context.Background()) + cancel() + if err := pkg.Write(cancelledCtx); err == nil { + t.Error("Write with cancelled context should fail") + } +} + +func runAddFileOverwrite(t *testing.T) { + t.Helper() + ctx := context.Background() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("v1"), nil) + if err != nil { + t.Fatalf("AddFileFromMemory(v1) failed: %v", err) + } + opts := &AddFileOptions{} + opts.AllowOverwrite.Set(true) + _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("v2"), opts) + if err != nil { + t.Logf("AddFileFromMemory with AllowOverwrite: %v (may not be fully implemented)", err) + } +} + +func runRemoveFileExpectFail(t *testing.T, path string) { + t.Helper() + ctx := context.Background() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + err = pkg.RemoveFile(ctx, path) + if err == nil { + t.Error("RemoveFile should fail") + } +} + +func runSafeWriteWithContent(t *testing.T) { + t.Helper() + ctx := context.Background() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) + if err != nil { + t.Fatalf("AddFileFromMemory failed: %v", err) + } + tmpPkg := filepath.Join(t.TempDir(), "test.pkg") + if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { + t.Fatalf("SetTargetPath failed: %v", err) + } + if err := pkg.SafeWrite(ctx, true); err != nil { + t.Fatalf("SafeWrite failed: %v", err) + } + + if _, err := os.Stat(tmpPkg); os.IsNotExist(err) { + t.Error("SafeWrite did not create package file") + } +} + +func runAddTwoPathsThenRemove(t *testing.T, storedPath2 string) { + t.Helper() + ctx := context.Background() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage failed: %v", err) + } + tmpDir := t.TempDir() + testFile := filepath.Join(tmpDir, "test.txt") + if err := os.WriteFile(testFile, []byte("content"), 0o644); err != nil { + t.Fatalf("Failed to create test file: %v", err) + } + entry1, err := pkg.AddFile(ctx, testFile, nil) + if err != nil { + t.Fatalf("AddFile failed: %v", err) + } + opts := &AddFileOptions{} + opts.StoredPath.Set(storedPath2) + entry2, err := pkg.AddFile(ctx, testFile, opts) + if err != nil { + t.Fatalf("AddFile with different path failed: %v", err) + } + if entry1.FileID != entry2.FileID { + t.Error("Deduplication should reuse same FileEntry") + } + err = pkg.RemoveFile(ctx, entry1.Paths[0].Path) + if err != nil { + t.Fatalf("RemoveFile failed: %v", err) + } +} + // ============================================================================= // TEST: NewPackage Constructor // ============================================================================= @@ -252,6 +554,8 @@ func TestPackageError_WithContext(t *testing.T) { } // TestNewPackageError tests the NewPackageError constructor. +// +//nolint:gocognit // table-driven error cases func TestNewPackageError(t *testing.T) { tests := []struct { name string @@ -319,35 +623,44 @@ func TestCheckContext_NilContext(t *testing.T) { } } -// TestCheckContext_CancelledContext tests checkContext with cancelled context. -func TestCheckContext_CancelledContext(t *testing.T) { - ctx := testhelpers.CancelledContext() +func assertCheckContextError(t *testing.T, ctx context.Context, wantType pkgerrors.ErrorType) { + t.Helper() err := internal.CheckContext(ctx, "test operation") if err == nil { - t.Error("checkContext() should return error for cancelled context") + t.Error("checkContext() should return error") } pkgErr := &pkgerrors.PackageError{} if !asPackageError(err, pkgErr) { t.Error("checkContext() should return PackageError") } - if pkgErr.Type != pkgerrors.ErrTypeContext { - t.Errorf("Error type = %v, want %v", pkgErr.Type, pkgerrors.ErrTypeContext) + if pkgErr.Type != wantType { + t.Errorf("Error type = %v, want %v", pkgErr.Type, wantType) } } +// TestCheckContext_CancelledContext tests checkContext with cancelled context. +func TestCheckContext_CancelledContext(t *testing.T) { + assertCheckContextError(t, testhelpers.CancelledContext(), pkgerrors.ErrTypeContext) +} + // TestCheckContext_TimeoutContext tests checkContext with timed out context. func TestCheckContext_TimeoutContext(t *testing.T) { - ctx := testhelpers.TimeoutContext() - err := internal.CheckContext(ctx, "test operation") - if err == nil { - t.Error("checkContext() should return error for timed out context") - } - pkgErr := &pkgerrors.PackageError{} - if !asPackageError(err, pkgErr) { - t.Error("checkContext() should return PackageError") + assertCheckContextError(t, testhelpers.TimeoutContext(), pkgerrors.ErrTypeContext) +} + +// runWithCancelledContext runs fn with a cancelled context and asserts it returns an error. +func runWithCancelledContext(t *testing.T, fn func(*filePackage, context.Context) (interface{}, error), opName string) { + t.Helper() + pkg, err := NewPackage() + if err != nil { + t.Fatalf("NewPackage() failed: %v", err) } - if pkgErr.Type != pkgerrors.ErrTypeContext { - t.Errorf("Error type = %v, want %v", pkgErr.Type, pkgerrors.ErrTypeContext) + defer func() { _ = pkg.Close() }() + cancelledCtx := testhelpers.CancelledContext() + fpkg := pkg.(*filePackage) + _, err = fn(fpkg, cancelledCtx) + if err == nil { + t.Errorf("%s should fail with cancelled context", opName) } } diff --git a/api/go/novus_package/package_types.go b/api/go/novus_package/package_types.go index ecfb9d58..c08515ac 100644 --- a/api/go/novus_package/package_types.go +++ b/api/go/novus_package/package_types.go @@ -28,7 +28,7 @@ import ( // All fields are populated from FileEntry static section (no variable-length data). // Paths have leading `/` stripped via internal.ToDisplayPath for user presentation. // -// Specification: api_core.md: 1.1.4.7 FileInfo Structure +// Specification: api_core.md: 1.2.4 FileInfo Structure type FileInfo struct { // Basic Identification PrimaryPath string // Primary display path (leading '/' removed, first path lexicographically) @@ -92,7 +92,7 @@ const ( // All fields use Option[T] types for optional configuration. Unset options use // implementation-defined defaults. // -// Specification: api_file_mgmt_addition.md: 2.8 AddFileOptions Configuration +// Specification: api_file_mgmt_addition.md: 2.8 AddFileOptions Struct type AddFileOptions struct { // Path determination options StoredPath generics.Option[string] // Explicit path override for storage location @@ -136,13 +136,56 @@ type AddFileOptions struct { ProgressCallback generics.Option[func(int64, int64)] // Progress callback (bytesProcessed, totalBytes) } +// RemoveDirectoryOptions configures directory removal behavior. +// +// Specification: api_file_mgmt_removal.md: 4.4 RemoveDirectoryOptions Struct +type RemoveDirectoryOptions struct { + // Recursive controls whether to remove files in subdirectories (default: true). + Recursive generics.Option[bool] + + // Pattern filters which files to remove (default: all files). + Pattern generics.Option[string] + + // RemoveEmptyDirs controls whether to remove directory metadata entries + // when all files in a directory are removed (default: true). + RemoveEmptyDirs generics.Option[bool] +} + // CompressionType represents the type of compression to use. // -// This type is used by PackageWriter interface methods. +// This type is used by package write operations. // Use constants from fileformat package (CompressionNone, CompressionZstd, etc.) // via the novuspack package re-exports. type CompressionType uint8 +// EncryptionAlgorithm represents the encryption algorithm identifier. +// +// Specification: api_security.md: 3.1.1 EncryptionAlgorithm Type +type EncryptionAlgorithm int + +const ( + EncryptionAlgorithmNone EncryptionAlgorithm = iota + EncryptionAlgorithmAES256GCM + EncryptionAlgorithmChaCha20Poly1305 + EncryptionAlgorithmMLKEM512 + EncryptionAlgorithmMLKEM768 + EncryptionAlgorithmMLKEM1024 +) + +// EncryptionType is a v1 alias of EncryptionAlgorithm. +// +// Specification: api_security.md: 3.1.2 EncryptionType Alias +type EncryptionType = EncryptionAlgorithm + +const ( + EncryptionNone EncryptionType = EncryptionAlgorithmNone + EncryptionAES256GCM EncryptionType = EncryptionAlgorithmAES256GCM + EncryptionChaCha20Poly1305 EncryptionType = EncryptionAlgorithmChaCha20Poly1305 + EncryptionMLKEM512 EncryptionType = EncryptionAlgorithmMLKEM512 + EncryptionMLKEM768 EncryptionType = EncryptionAlgorithmMLKEM768 + EncryptionMLKEM1024 EncryptionType = EncryptionAlgorithmMLKEM1024 +) + // EncryptionKey represents an encryption key for file encryption. // // This is a placeholder type for Priority 6 (Encryption Support). diff --git a/api/go/novus_package/package_version_tracking_test.go b/api/go/novus_package/package_version_tracking_test.go index 5b7eaf7b..50a6c7f2 100644 --- a/api/go/novus_package/package_version_tracking_test.go +++ b/api/go/novus_package/package_version_tracking_test.go @@ -116,8 +116,7 @@ func TestPackage_VersionTracking_SyncToHeader(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("Write failed: %v", err) } // Verify versions are preserved (would need to reopen to check header) @@ -145,15 +144,13 @@ func TestPackage_VersionTracking_SyncFromHeader(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("Write failed: %v", err) } // Reopen and verify versions are loaded from header pkg2, err := OpenPackage(ctx, tmpPkg) if err != nil { - t.Logf("OpenPackage failed: %v (may require complete Write implementation)", err) - return + t.Fatalf("OpenPackage failed: %v", err) } defer func() { _ = pkg2.Close() }() diff --git a/api/go/novus_package/package_write_integration_test.go b/api/go/novus_package/package_write_integration_test.go index 6bd5e850..2b1cbc62 100644 --- a/api/go/novus_package/package_write_integration_test.go +++ b/api/go/novus_package/package_write_integration_test.go @@ -27,7 +27,7 @@ func TestPackage_SafeWrite_InSubdirectory(t *testing.T) { // Create subdirectory tmpDir := t.TempDir() subDir := filepath.Join(tmpDir, "subdir") - if err := os.MkdirAll(subDir, 0755); err != nil { + if err := os.MkdirAll(subDir, 0o755); err != nil { t.Fatalf("Failed to create subdirectory: %v", err) } @@ -37,7 +37,7 @@ func TestPackage_SafeWrite_InSubdirectory(t *testing.T) { } if err := pkg.SafeWrite(ctx, true); err != nil { - t.Logf("SafeWrite failed: %v (implementation may be incomplete)", err) + t.Fatalf("SafeWrite failed: %v", err) } } @@ -59,15 +59,13 @@ func TestPackage_Write_ReopenAndModify(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("Write failed: %v", err) } // Reopen and modify pkg2, err := OpenPackage(ctx, tmpPkg) if err != nil { - t.Logf("OpenPackage failed: %v (may require complete Write implementation)", err) - return + t.Fatalf("OpenPackage failed: %v", err) } defer func() { _ = pkg2.Close() }() @@ -101,76 +99,20 @@ func TestPackage_WritePackageToFile_IndexBuilding(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) + t.Fatalf("Write failed: %v", err) } } func TestPackage_WritePackageToFile_HeaderUpdates(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - } + runWriteWithContent(t, []byte("content"), false) } func TestPackage_SafeWrite_AtomicRename(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - // SafeWrite uses atomic rename (temp file + rename) - if err := pkg.SafeWrite(ctx, true); err != nil { - t.Logf("SafeWrite failed: %v (implementation may be incomplete)", err) - } + runSafeWriteWithContent(t) } func TestPackage_WriteFile_UpdateExistingWithDifferentData(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Add first file - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("v1"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory(v1) failed: %v", err) - } - - // Add file with same path and AllowOverwrite - opts := &AddFileOptions{} - opts.AllowOverwrite.Set(true) - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("v2"), opts) - if err != nil { - t.Logf("AddFileFromMemory with AllowOverwrite: %v (may not be fully implemented)", err) - } + runAddFileOverwrite(t) } func TestPackage_RemoveFile_UpdatePathMetadata(t *testing.T) { @@ -193,18 +135,5 @@ func TestPackage_RemoveFile_UpdatePathMetadata(t *testing.T) { } func TestPackage_Write_EmptyPackage(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - tmpPkg := filepath.Join(t.TempDir(), "empty.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - } + runWriteEmptyPackage(t) } diff --git a/api/go/novus_package/package_writer.go b/api/go/novus_package/package_writer.go index 34159392..4a024abc 100644 --- a/api/go/novus_package/package_writer.go +++ b/api/go/novus_package/package_writer.go @@ -1,8 +1,8 @@ -// This file implements PackageWriter interface methods: Write, SafeWrite, FastWrite. +// This file implements Package write operations: Write, SafeWrite, FastWrite. // It contains all write operations for persisting package contents to disk as specified // in api_core.md and api_writing.md. // -// Note: StageFile and UnstageFile have been removed from the PackageWriter interface +// Note: StageFile and UnstageFile have been removed from the Package API // as per Priority 0 requirements. File management now uses AddFile/AddFileFromMemory/ // RemoveFile methods instead. // @@ -15,6 +15,9 @@ package novus_package import ( "context" + "encoding/binary" + "fmt" + "hash/crc32" "io" "os" "path/filepath" @@ -33,7 +36,7 @@ import ( // // This baseline implementation uses SafeWrite with overwrite=true. // -// Specification: api_core.md: 1.2 PackageWriter Interface +// Specification: api_core.md: 1.3 Package Write Operations // Specification: api_writing.md: 1. SafeWrite - Atomic Package Writing func (p *filePackage) Write(ctx context.Context) error { // Validate context @@ -62,8 +65,8 @@ func (p *filePackage) Write(ctx context.Context) error { // // This baseline implementation writes uncompressed and unencrypted files only. // -// Specification: api_core.md: 1.2 PackageWriter Interface -// Specification: api_writing.md: 5.3.3 Write Method Compression Handling +// Specification: api_core.md: 1.3 Package Write Operations +// Specification: api_writing.md: 5.3.3 Package.Write Method func (p *filePackage) SafeWrite(ctx context.Context, overwrite bool) error { // Validate context if err := internal.CheckContext(ctx, "SafeWrite"); err != nil { @@ -106,8 +109,9 @@ func (p *filePackage) SafeWrite(ctx context.Context, overwrite bool) error { }() // Write package to temp file - if writeErr = p.writePackageToFile(ctx, tempFile); writeErr != nil { - return writeErr + if err := p.writePackageToFile(ctx, tempFile); err != nil { + writeErr = err + return err } // Close temp file before rename @@ -140,7 +144,7 @@ func (p *filePackage) SafeWrite(ctx context.Context, overwrite bool) error { // // TODO: Implement fast package writing with state-driven compression and signing. // -// Specification: api_core.md: 1.2 PackageWriter Interface +// Specification: api_core.md: 1.3 Package Write Operations // Specification: api_writing.md: 1. SafeWrite - Atomic Package Writing func (p *filePackage) FastWrite(ctx context.Context) error { // TODO: Implement fast package writing @@ -152,7 +156,7 @@ func (p *filePackage) FastWrite(ctx context.Context) error { // // TODO: Implement package defragmentation. // -// Specification: api_core.md: 1.3 Package Interface +// Specification: api_basic_operations.md: 16. Package.Defragment Method func (p *filePackage) Defragment(ctx context.Context) error { // TODO: Implement defragmentation return pkgerrors.NewPackageError(pkgerrors.ErrTypeUnsupported, "Defragment not yet implemented", nil, struct{}{}) @@ -173,6 +177,8 @@ func (p *filePackage) Defragment(ctx context.Context) error { // // Returns: // - error: *PackageError on failure +// +//nolint:gocognit,gocyclo // write sequence branches func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) error { // Update PackageInfo (canonical in-memory metadata) if p.Info == nil { @@ -203,7 +209,7 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err // Write placeholder header (we'll update it later with correct offsets) headerSize := int64(fileformat.PackageHeaderSize) - if _, err := p.header.WriteTo(file); err != nil { + if _, err := writePackageHeader(file, p.header); err != nil { return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to write header placeholder") } @@ -221,6 +227,10 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err continue } + if fe.IsDataLoaded { + p.syncStoredMetadataFromMemory(fe) + } + // Record entry offset in index index.Entries = append(index.Entries, fileformat.IndexEntry{ FileID: fe.FileID, @@ -228,6 +238,7 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err }) // Write file entry metadata + entryOffset := currentOffset metaWritten, err := fe.WriteMetaTo(file) if err != nil { return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to write file entry metadata") @@ -235,14 +246,67 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err currentOffset += uint64(metaWritten) // Write file data - if fe.IsDataLoaded { + switch { + case fe.IsDataLoaded: // Write in-memory data n, err := file.Write(fe.Data) if err != nil { return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to write file data") } currentOffset += uint64(n) - } else if p.fileHandle != nil { + case fe.SourceFile != nil: + dataSize := fe.SourceSize + if dataSize == 0 { + dataSize = int64(fe.OriginalSize) + } + + if _, err := fe.SourceFile.Seek(fe.SourceOffset, io.SeekStart); err != nil { + return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to seek to source file data") + } + + needsChecksums := fe.RawChecksum == 0 || fe.StoredChecksum == 0 || fe.StoredSize == 0 + if needsChecksums { + hasher := crc32.NewIEEE() + writer := io.MultiWriter(file, hasher) + n, err := io.CopyN(writer, fe.SourceFile, dataSize) + if err != nil { + return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to copy file data from source") + } + if n != dataSize { + return pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "source file size mismatch during write", nil, pkgerrors.ValidationErrorContext{ + Field: "SourceSize", + Value: n, + Expected: fmt.Sprintf("%d", dataSize), + }) + } + + checksum := hasher.Sum32() + if fe.RawChecksum == 0 { + fe.RawChecksum = checksum + } + if fe.StoredChecksum == 0 { + fe.StoredChecksum = checksum + } + if fe.StoredSize == 0 { + fe.StoredSize = uint64(n) + } + + currentOffset += uint64(n) + + if err := p.rewriteFileEntryMeta(file, entryOffset, fe); err != nil { + return err + } + if _, err := file.Seek(int64(currentOffset), io.SeekStart); err != nil { + return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to seek back to end after metadata rewrite") + } + } else { + n, err := io.CopyN(file, fe.SourceFile, dataSize) + if err != nil { + return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to copy file data from source") + } + currentOffset += uint64(n) + } + case p.fileHandle != nil: // Stream existing file data from opened package // Find data offset in source file var sourceDataOffset uint64 @@ -266,6 +330,12 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err } currentOffset += uint64(n) } + default: + return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "file entry has no data source for writing", nil, pkgerrors.ValidationErrorContext{ + Field: "FileEntry", + Value: fe.FileID, + Expected: "data loaded, source file, or open package handle", + }) } // Check context cancellation periodically @@ -281,7 +351,7 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err // Write file index indexStart := currentOffset - indexWritten, err := index.WriteTo(file) + indexWritten, err := writeFileIndexTo(file, index) if err != nil { return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to write file index") } @@ -295,10 +365,9 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err if p.Info != nil && p.Info.Comment != "" { commentStart := currentOffset - // Create PackageComment and serialize properly - comment := metadata.NewPackageComment() - if err := comment.SetComment(p.Info.Comment); err != nil { - return pkgerrors.WrapError(err, pkgerrors.ErrTypeValidation, "failed to set comment for writing") + comment, err := buildPackageComment(p.Info.Comment) + if err != nil { + return pkgerrors.WrapError(err, pkgerrors.ErrTypeValidation, "failed to build comment for writing") } commentWritten, err := comment.WriteTo(file) @@ -323,7 +392,7 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to seek to beginning for header update") } - if _, err := p.header.WriteTo(file); err != nil { + if _, err := writePackageHeader(file, p.header); err != nil { return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to write updated header") } @@ -334,7 +403,19 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err if p.Info == nil { p.Info = metadata.NewPackageInfo() } - p.Info.FileCount = len(p.FileEntries) + fileCount := 0 + for _, fe := range p.FileEntries { + if fe == nil { + continue + } + if p.SpecialFiles != nil { + if _, ok := p.SpecialFiles[fe.Type]; ok { + continue + } + } + fileCount++ + } + p.Info.FileCount = fileCount // Note: FileCount doesn't sync to header because header doesn't have a FileCount field // The file count is derived from the file index when reading @@ -352,3 +433,98 @@ func (p *filePackage) writePackageToFile(ctx context.Context, file *os.File) err return nil } + +func (p *filePackage) syncStoredMetadataFromMemory(fe *metadata.FileEntry) { + if fe == nil || !fe.IsDataLoaded { + return + } + + if fe.StoredSize == 0 { + fe.StoredSize = uint64(len(fe.Data)) + } + if fe.OriginalSize == 0 { + fe.OriginalSize = fe.StoredSize + } + if fe.RawChecksum == 0 || fe.StoredChecksum == 0 { + checksum := internal.CalculateCRC32(fe.Data) + if fe.RawChecksum == 0 { + fe.RawChecksum = checksum + } + if fe.StoredChecksum == 0 { + fe.StoredChecksum = checksum + } + } +} + +func (p *filePackage) rewriteFileEntryMeta(file *os.File, entryOffset uint64, fe *metadata.FileEntry) error { + if _, err := file.Seek(int64(entryOffset), io.SeekStart); err != nil { + return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to seek to file entry metadata for rewrite") + } + if _, err := fe.WriteMetaTo(file); err != nil { + return pkgerrors.WrapError(err, pkgerrors.ErrTypeIO, "failed to rewrite file entry metadata") + } + return nil +} + +func writePackageHeader(w io.Writer, header *fileformat.PackageHeader) (int64, error) { + if header == nil { + return 0, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "package header is nil", nil, struct{}{}) + } + if err := binary.Write(w, binary.LittleEndian, header); err != nil { + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write header", pkgerrors.ValidationErrorContext{ + Field: "Header", + Value: nil, + Expected: fmt.Sprintf("%d bytes", fileformat.PackageHeaderSize), + }) + } + return fileformat.PackageHeaderSize, nil +} + +func writeFileIndexTo(w io.Writer, index *fileformat.FileIndex) (int64, error) { + if index == nil { + return 0, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "file index is nil", nil, struct{}{}) + } + + var totalWritten int64 + index.EntryCount = uint32(len(index.Entries)) + + if err := binary.Write(w, binary.LittleEndian, index.EntryCount); err != nil { + return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write entry count", pkgerrors.ValidationErrorContext{ + Field: "EntryCount", + Value: index.EntryCount, + Expected: "written successfully", + }) + } + totalWritten += 4 + + if err := binary.Write(w, binary.LittleEndian, index.Reserved); err != nil { + return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write reserved", pkgerrors.ValidationErrorContext{ + Field: "Reserved", + Value: index.Reserved, + Expected: "written successfully", + }) + } + totalWritten += 4 + + if err := binary.Write(w, binary.LittleEndian, index.FirstEntryOffset); err != nil { + return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write first entry offset", pkgerrors.ValidationErrorContext{ + Field: "FirstEntryOffset", + Value: index.FirstEntryOffset, + Expected: "written successfully", + }) + } + totalWritten += 8 + + for i, entry := range index.Entries { + if err := binary.Write(w, binary.LittleEndian, entry); err != nil { + return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, fmt.Sprintf("failed to write entry %d", i), pkgerrors.ValidationErrorContext{ + Field: "Entries", + Value: i, + Expected: "written successfully", + }) + } + totalWritten += fileformat.IndexEntrySize + } + + return totalWritten, nil +} diff --git a/api/go/novus_package/package_writer_comprehensive_test.go b/api/go/novus_package/package_writer_comprehensive_test.go index ae71bd97..1f021042 100644 --- a/api/go/novus_package/package_writer_comprehensive_test.go +++ b/api/go/novus_package/package_writer_comprehensive_test.go @@ -13,28 +13,7 @@ import ( ) func TestPackage_WriteFile_ContextCancellation(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - cancelledCtx, cancel := context.WithCancel(context.Background()) - cancel() - - if err := pkg.Write(cancelledCtx); err == nil { - t.Error("Write with cancelled context should fail") - } + runWriteContextCancelled(t) } func TestPackage_RemoveFile_ContextCancellation(t *testing.T) { @@ -58,28 +37,7 @@ func TestPackage_RemoveFile_ContextCancellation(t *testing.T) { } func TestPackage_Write_ContextCancellation(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - cancelledCtx, cancel := context.WithCancel(context.Background()) - cancel() - - if err := pkg.Write(cancelledCtx); err == nil { - t.Error("Write with cancelled context should fail") - } + runWriteContextCancelled(t) } func TestPackage_SafeWrite_ContextCancellation(t *testing.T) { @@ -125,7 +83,7 @@ func TestPackage_SafeWrite_FileExistsNoOverwrite(t *testing.T) { } // Create existing file - if err := os.WriteFile(tmpPkg, []byte("existing"), 0644); err != nil { + if err := os.WriteFile(tmpPkg, []byte("existing"), 0o644); err != nil { t.Fatalf("Failed to create existing file: %v", err) } @@ -136,42 +94,11 @@ func TestPackage_SafeWrite_FileExistsNoOverwrite(t *testing.T) { } func TestPackage_WritePackageToFile_WithComment(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - } + runWriteWithContent(t, []byte("content"), false) } func TestPackage_WritePackageToFile_EmptyPackageWithComment(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - tmpPkg := filepath.Join(t.TempDir(), "empty.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - } + runWriteEmptyPackage(t) } func TestPackage_WritePackageToFile_MultipleFilesVariousSizes(t *testing.T) { @@ -202,88 +129,16 @@ func TestPackage_WritePackageToFile_MultipleFilesVariousSizes(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) + t.Fatalf("Write failed: %v", err) } } func TestPackage_RemoveFile_AllPaths(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Add file with first path - tmpDir := t.TempDir() - testFile := filepath.Join(tmpDir, "test.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { - t.Fatalf("Failed to create test file: %v", err) - } - - entry1, err := pkg.AddFile(ctx, testFile, nil) - if err != nil { - t.Fatalf("AddFile failed: %v", err) - } - - // Add same file with different path (deduplication) - opts := &AddFileOptions{} - opts.StoredPath.Set("/path2.txt") - entry2, err := pkg.AddFile(ctx, testFile, opts) - if err != nil { - t.Fatalf("AddFile with different path failed: %v", err) - } - - // Should be same entry - if entry1.FileID != entry2.FileID { - t.Error("Deduplication should reuse same FileEntry") - } - - // Remove file (removes all paths) - err = pkg.RemoveFile(ctx, entry1.Paths[0].Path) - if err != nil { - t.Fatalf("RemoveFile failed: %v", err) - } + runAddTwoPathsThenRemove(t, "/path2.txt") } func TestPackage_RemoveFile_OnePath(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Add file with first path - tmpDir := t.TempDir() - testFile := filepath.Join(tmpDir, "test.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { - t.Fatalf("Failed to create test file: %v", err) - } - - entry1, err := pkg.AddFile(ctx, testFile, nil) - if err != nil { - t.Fatalf("AddFile failed: %v", err) - } - - // Add same file with different path (deduplication) - opts := &AddFileOptions{} - opts.StoredPath.Set("/path2.txt") - entry2, err := pkg.AddFile(ctx, testFile, opts) - if err != nil { - t.Fatalf("AddFile with different path failed: %v", err) - } - - // Should be same entry with multiple paths - if entry1.FileID != entry2.FileID { - t.Error("Deduplication should reuse same FileEntry") - } - - // Remove one path (file should still exist with other path) - err = pkg.RemoveFile(ctx, entry1.Paths[0].Path) - if err != nil { - t.Fatalf("RemoveFile failed: %v", err) - } + runAddTwoPathsThenRemove(t, "/path2.txt") } func TestPackage_WriteFile_PathValidation(t *testing.T) { @@ -314,26 +169,7 @@ func TestPackage_WriteFile_PathValidation(t *testing.T) { } func TestPackage_SafeWrite_TempFileCreation(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - // SafeWrite creates temp file then renames - if err := pkg.SafeWrite(ctx, true); err != nil { - t.Logf("SafeWrite failed: %v (implementation may be incomplete)", err) - } + runSafeWriteWithContent(t) } func TestPackage_Write_UpdatesInfo(t *testing.T) { @@ -365,8 +201,7 @@ func TestPackage_Write_UpdatesInfo(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("Write failed: %v", err) } // Info should reflect file count diff --git a/api/go/novus_package/package_writer_edge_cases_test.go b/api/go/novus_package/package_writer_edge_cases_test.go index 127bf479..98b63d3c 100644 --- a/api/go/novus_package/package_writer_edge_cases_test.go +++ b/api/go/novus_package/package_writer_edge_cases_test.go @@ -23,7 +23,7 @@ func TestPackage_WritePackageToFile_StreamFromDisk(t *testing.T) { // Add file from disk (streaming) tmpDir := t.TempDir() testFile := filepath.Join(tmpDir, "test.txt") - if err := os.WriteFile(testFile, []byte("content from disk"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("content from disk"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -38,33 +38,12 @@ func TestPackage_WritePackageToFile_StreamFromDisk(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) + t.Fatalf("Write failed: %v", err) } } func TestPackage_WritePackageToFile_ContextCancelDuringWrite(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - cancelledCtx, cancel := context.WithCancel(context.Background()) - cancel() - - if err := pkg.Write(cancelledCtx); err == nil { - t.Error("Write with cancelled context should fail") - } + runWriteContextCancelled(t) } func TestPackage_WritePackageToFile_NilFileEntry(t *testing.T) { @@ -106,7 +85,7 @@ func TestPackage_WritePackageToFile_LargeFileStreaming(t *testing.T) { for i := range largeData { largeData[i] = byte(i % 256) } - if err := os.WriteFile(largeFile, largeData, 0644); err != nil { + if err := os.WriteFile(largeFile, largeData, 0o644); err != nil { t.Fatalf("Failed to create large file: %v", err) } @@ -121,7 +100,7 @@ func TestPackage_WritePackageToFile_LargeFileStreaming(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) + t.Fatalf("Write failed: %v", err) } } @@ -136,7 +115,7 @@ func TestPackage_WriteFile_WithMultiplePaths(t *testing.T) { // Add file with first path tmpDir := t.TempDir() testFile := filepath.Join(tmpDir, "test.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { + if err := os.WriteFile(testFile, []byte("content"), 0o644); err != nil { t.Fatalf("Failed to create test file: %v", err) } @@ -164,34 +143,7 @@ func TestPackage_WriteFile_WithMultiplePaths(t *testing.T) { } func TestPackage_ReadFile_StreamingPath(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Create package to open it (required for ReadFile) - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.Create(ctx, tmpPkg); err != nil { - t.Fatalf("Create failed: %v", err) - } - - testContent := []byte("streaming content") - entry, err := pkg.AddFileFromMemory(ctx, "/test.txt", testContent, nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - // Read file - data, err := pkg.ReadFile(ctx, entry.Paths[0].Path) - if err != nil { - t.Fatalf("ReadFile failed: %v", err) - } - - if string(data) != string(testContent) { - t.Errorf("ReadFile content mismatch: got %q, want %q", string(data), string(testContent)) - } + runReadFileFromDisk(t, []byte("streaming content")) } func TestPackage_ReadFile_CompressedFile(t *testing.T) { @@ -223,12 +175,11 @@ func TestPackage_Write_MultipleWriteCycles(t *testing.T) { // First write if err := pkg.Write(ctx); err != nil { - t.Logf("First Write failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("First Write failed: %v", err) } // Second write (overwrite) if err := pkg.Write(ctx); err != nil { - t.Logf("Second Write failed: %v (implementation may be incomplete)", err) + t.Fatalf("Second Write failed: %v", err) } } diff --git a/api/go/novus_package/package_writer_test.go b/api/go/novus_package/package_writer_test.go index 9d9d6999..6e2a159c 100644 --- a/api/go/novus_package/package_writer_test.go +++ b/api/go/novus_package/package_writer_test.go @@ -6,6 +6,7 @@ package novus_package import ( + "bytes" "context" "os" "path/filepath" @@ -13,34 +14,7 @@ import ( ) func TestPackage_WriteFile(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Add file using AddFileFromMemory (simpler and more reliable) - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("test content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - // Set target path and write - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - return - } - - // Verify file was created - if _, err := os.Stat(tmpPkg); os.IsNotExist(err) { - t.Error("Write did not create package file") - } + runWriteWithContent(t, []byte("test content"), true) } func TestPackage_WriteFile_ThenReadFile(t *testing.T) { @@ -65,15 +39,13 @@ func TestPackage_WriteFile_ThenReadFile(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("Write failed: %v", err) } // Open and read the package pkg2, err := OpenPackage(ctx, tmpPkg) if err != nil { - t.Logf("OpenPackage failed: %v (may require complete Write implementation)", err) - return + t.Fatalf("OpenPackage failed: %v", err) } defer func() { _ = pkg2.Close() }() @@ -83,7 +55,7 @@ func TestPackage_WriteFile_ThenReadFile(t *testing.T) { t.Fatalf("ReadFile failed: %v", err) } - if string(data) != string(testContent) { + if !bytes.Equal(data, testContent) { t.Errorf("ReadFile content mismatch: got %q, want %q", string(data), string(testContent)) } } @@ -118,34 +90,7 @@ func TestPackage_RemoveFile(t *testing.T) { } func TestPackage_Write(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Add file using AddFileFromMemory - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("content"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - - // Set target path and write - tmpPkg := filepath.Join(t.TempDir(), "test.pkg") - if err := pkg.SetTargetPath(ctx, tmpPkg); err != nil { - t.Fatalf("SetTargetPath failed: %v", err) - } - - if err := pkg.Write(ctx); err != nil { - t.Logf("Write failed: %v (implementation may be incomplete)", err) - return - } - - // Verify file was created - if _, err := os.Stat(tmpPkg); os.IsNotExist(err) { - t.Error("Write did not create package file") - } + runWriteWithContent(t, []byte("content"), true) } func TestPackage_SafeWrite(t *testing.T) { @@ -169,8 +114,7 @@ func TestPackage_SafeWrite(t *testing.T) { } if err := pkg.SafeWrite(ctx, true); err != nil { - t.Logf("SafeWrite failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("SafeWrite failed: %v", err) } // Verify file was created @@ -201,15 +145,13 @@ func TestPackage_SafeWrite_RoundTrip(t *testing.T) { } if err := pkg.SafeWrite(ctx, true); err != nil { - t.Logf("SafeWrite failed: %v (implementation may be incomplete)", err) - return + t.Fatalf("SafeWrite failed: %v", err) } // Open and read the package pkg2, err := OpenPackage(ctx, tmpPkg) if err != nil { - t.Logf("OpenPackage failed: %v (may require complete Write implementation)", err) - return + t.Fatalf("OpenPackage failed: %v", err) } defer func() { _ = pkg2.Close() }() @@ -219,7 +161,7 @@ func TestPackage_SafeWrite_RoundTrip(t *testing.T) { t.Fatalf("ReadFile failed: %v", err) } - if string(data) != string(testContent) { + if !bytes.Equal(data, testContent) { t.Errorf("ReadFile content mismatch: got %q, want %q", string(data), string(testContent)) } } @@ -248,41 +190,11 @@ func TestPackage_WriteFile_InvalidPath(t *testing.T) { } func TestPackage_WriteFile_UpdateExisting(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Add first file - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("v1"), nil) - if err != nil { - t.Fatalf("AddFileFromMemory(v1) failed: %v", err) - } - - // Add file with same path and AllowOverwrite - opts := &AddFileOptions{} - opts.AllowOverwrite.Set(true) - _, err = pkg.AddFileFromMemory(ctx, "/test.txt", []byte("v2"), opts) - if err != nil { - t.Logf("AddFileFromMemory with AllowOverwrite: %v (may not be fully implemented)", err) - } + runAddFileOverwrite(t) } func TestPackage_RemoveFile_NonExistent(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Try to remove non-existent file - err = pkg.RemoveFile(ctx, "/nonexistent.txt") - if err == nil { - t.Error("RemoveFile with non-existent path should fail") - } + runRemoveFileExpectFail(t, "/nonexistent.txt") } func TestPackage_SafeWrite_NoFilePath(t *testing.T) { @@ -321,7 +233,7 @@ func TestPackage_SafeWrite_NoOverwrite(t *testing.T) { } // Create existing file - if err := os.WriteFile(tmpPkg, []byte("existing"), 0644); err != nil { + if err := os.WriteFile(tmpPkg, []byte("existing"), 0o644); err != nil { t.Fatalf("Failed to create existing file: %v", err) } @@ -348,58 +260,11 @@ func TestPackage_Write_NoFilePath(t *testing.T) { } func TestPackage_RemoveFile_WithMultiplePaths(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Add file with first path - tmpDir := t.TempDir() - testFile := filepath.Join(tmpDir, "test.txt") - if err := os.WriteFile(testFile, []byte("content"), 0644); err != nil { - t.Fatalf("Failed to create test file: %v", err) - } - - entry1, err := pkg.AddFile(ctx, testFile, nil) - if err != nil { - t.Fatalf("AddFile failed: %v", err) - } - - // Add same file with different path (deduplication should add path to existing entry) - opts := &AddFileOptions{} - opts.StoredPath.Set("/different/path.txt") - entry2, err := pkg.AddFile(ctx, testFile, opts) - if err != nil { - t.Fatalf("AddFile with different path failed: %v", err) - } - - // Should be same entry (deduplication) - if entry1.FileID != entry2.FileID { - t.Error("Deduplication should reuse same FileEntry") - } - - // Remove file using one path - err = pkg.RemoveFile(ctx, entry1.Paths[0].Path) - if err != nil { - t.Fatalf("RemoveFile failed: %v", err) - } + runAddTwoPathsThenRemove(t, "/different/path.txt") } func TestPackage_RemoveFile_InvalidPath(t *testing.T) { - pkg, err := NewPackage() - if err != nil { - t.Fatalf("NewPackage failed: %v", err) - } - - ctx := context.Background() - - // Try to remove with invalid path - err = pkg.RemoveFile(ctx, "") - if err == nil { - t.Error("RemoveFile with empty path should fail") - } + runRemoveFileExpectFail(t, "") } func TestPackage_WritePackageToFile_EmptyPackage(t *testing.T) { @@ -441,7 +306,7 @@ func TestPackage_WriteFile_LargeFile(t *testing.T) { for i := range largeData { largeData[i] = byte(i % 256) } - if err := os.WriteFile(largeFile, largeData, 0644); err != nil { + if err := os.WriteFile(largeFile, largeData, 0o644); err != nil { t.Fatalf("Failed to create large file: %v", err) } @@ -457,7 +322,7 @@ func TestPackage_WriteFile_LargeFile(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write with large file failed: %v (implementation may be incomplete)", err) + t.Fatalf("Write with large file failed: %v", err) } } @@ -469,15 +334,7 @@ func TestPackage_SafeWrite_MultipleFiles(t *testing.T) { ctx := context.Background() - // Add multiple files - for i := 0; i < 3; i++ { - path := "/file" + string(rune('0'+i)) + ".txt" - data := []byte("content " + string(rune('0'+i))) - _, err := pkg.AddFileFromMemory(ctx, path, data, nil) - if err != nil { - t.Fatalf("AddFileFromMemory failed: %v", err) - } - } + addThreeFilesFromMemory(t, ctx, pkg, "/", "content ") // Set target path and safe write tmpPkg := filepath.Join(t.TempDir(), "multi.pkg") @@ -513,7 +370,7 @@ func TestPackage_Write_WithPathMetadata(t *testing.T) { } if err := pkg.Write(ctx); err != nil { - t.Logf("Write with path metadata failed: %v (implementation may be incomplete)", err) + t.Fatalf("Write with path metadata failed: %v", err) } } @@ -538,7 +395,7 @@ func TestPackage_SafeWrite_Overwrite(t *testing.T) { } // Create existing file - if err := os.WriteFile(tmpPkg, []byte("existing"), 0644); err != nil { + if err := os.WriteFile(tmpPkg, []byte("existing"), 0o644); err != nil { t.Fatalf("Failed to create existing file: %v", err) } diff --git a/api/go/novuspack.go b/api/go/novuspack.go index 82aee8b5..1bb95431 100644 --- a/api/go/novuspack.go +++ b/api/go/novuspack.go @@ -34,18 +34,17 @@ import ( // Re-export core package interfaces from novus_package type ( Package = novus_package.Package - PackageReader = novus_package.PackageReader - PackageWriter = novus_package.PackageWriter PackageBuilder = novus_package.PackageBuilder ) // Re-export package operation types from novus_package type ( - FileInfo = novus_package.FileInfo - AddFileOptions = novus_package.AddFileOptions - CreateOptions = novus_package.CreateOptions - CompressionType = novus_package.CompressionType - EncryptionType = novus_package.EncryptionType + FileInfo = novus_package.FileInfo + AddFileOptions = novus_package.AddFileOptions + RemoveDirectoryOptions = novus_package.RemoveDirectoryOptions + CreateOptions = novus_package.CreateOptions + CompressionType = novus_package.CompressionType + EncryptionType = novus_package.EncryptionType ) // Re-export types from pkgerrors @@ -441,7 +440,7 @@ func Err[T any](err error) Result[T] { // Returns the normalized path with leading "/" or an error if the path is // invalid or would escape the package root. // -// Specification: api_core.md: 1.1.2 Package Path Semantics +// Specification: api_core.md: 12.1 NormalizePackagePath Function func NormalizePackagePath(path string) (string, error) { return internal.NormalizePackagePath(path) } @@ -449,7 +448,7 @@ func NormalizePackagePath(path string) (string, error) { // ToDisplayPath converts a stored package path (with leading "/") to display // format by stripping the leading slash. Use for user-facing path display. // -// Specification: api_core.md: 1.1.2 Package Path Semantics +// Specification: api_core.md: 12.2 ToDisplayPath Function func ToDisplayPath(storedPath string) string { return internal.ToDisplayPath(storedPath) } @@ -460,7 +459,7 @@ func ToDisplayPath(storedPath string) string { // Returns warnings for paths over 260 (Windows), 1024 (macOS), 4096 (Linux) // bytes; error only when over the absolute maximum. // -// Specification: api_generics.md: 1.3.3.9 Path Length Limits and Warnings +// Specification: api_core.md: 12.4 ValidatePathLength Function func ValidatePathLength(path string) ([]string, error) { return internal.ValidatePathLength(path) } @@ -468,7 +467,7 @@ func ValidatePathLength(path string) ([]string, error) { // ValidatePackagePath validates a package-internal path: non-empty, no // root-escape via dot segments, valid format. Delegates to NormalizePackagePath. // -// Specification: api_core.md: 1.1.2.2 Path Validation +// Specification: api_core.md: 12.3 ValidatePackagePath Function func ValidatePackagePath(path string) error { return internal.ValidatePackagePath(path) } @@ -506,7 +505,7 @@ func ValidatePackagePath(path string) error { // } // defer pkg.Close() // -// Specification: api_basic_operations.md: 6.1 Package Constructor +// Specification: api_basic_operations.md: 6.1 NewPackage Behavior func NewPackage() (Package, error) { return novus_package.NewPackage() } @@ -527,8 +526,6 @@ func NewPackage() (Package, error) { // WithVendorID(0x12345678). // WithAppID(0x87654321). // Build(ctx) -// -// Specification: api_basic_operations.md: 6.5 PackageBuilder Pattern func NewBuilder() PackageBuilder { return novus_package.NewBuilder() } @@ -567,7 +564,7 @@ func NewBuilder() PackageBuilder { // } // fmt.Printf("Files: %d\n", info.FileCount) // -// Specification: api_basic_operations.md: 7.1 OpenPackage +// Specification: api_basic_operations.md: 10. OpenPackage Function func OpenPackage(ctx context.Context, path string) (Package, error) { return novus_package.OpenPackage(ctx, path) } @@ -607,7 +604,7 @@ func OpenPackage(ctx context.Context, path string) (Package, error) { // err = pkg.StageFile(ctx, "new.txt", []byte("data"), nil) // // err is *PackageError with Type == ErrTypeSecurity // -// Specification: api_basic_operations.md: 7.2 OpenPackageReadOnly +// Specification: api_basic_operations.md: 11.2 OpenPackageReadOnly Function func OpenPackageReadOnly(ctx context.Context, path string) (Package, error) { return novus_package.OpenPackageReadOnly(ctx, path) } @@ -654,7 +651,7 @@ func OpenPackageReadOnly(ctx context.Context, path string) (Package, error) { // defer pkg.Close() // // Attempt to extract whatever data is accessible // -// Specification: api_basic_operations.md: 7.3 OpenBrokenPackage +// Specification: api_basic_operations.md: 12. OpenBrokenPackage Function func OpenBrokenPackage(ctx context.Context, path string) (Package, error) { return novus_package.OpenBrokenPackage(ctx, path) } @@ -697,7 +694,7 @@ func OpenBrokenPackage(ctx context.Context, path string) (Package, error) { // } // fmt.Printf("Format Version: %d\n", header.FormatVersion) // -// Specification: api_basic_operations.md: 9.4 Header Inspection +// Specification: api_basic_operations.md: 18. Header Inspection func ReadHeader(ctx context.Context, reader io.Reader) (*fileformat.PackageHeader, error) { return novus_package.ReadHeader(ctx, reader) } @@ -742,7 +739,7 @@ func ReadHeader(ctx context.Context, reader io.Reader) (*fileformat.PackageHeade // fmt.Printf("Magic: 0x%08X\n", header.Magic) // fmt.Printf("Index Start: %d\n", header.IndexStart) // -// Specification: api_basic_operations.md: 9.4 Header Inspection +// Specification: api_basic_operations.md: 18. Header Inspection func ReadHeaderFromPath(ctx context.Context, path string) (*fileformat.PackageHeader, error) { return novus_package.ReadHeaderFromPath(ctx, path) } diff --git a/api/go/novuspack_test.go b/api/go/novuspack_test.go index dcf4c91e..1d3e034c 100644 --- a/api/go/novuspack_test.go +++ b/api/go/novuspack_test.go @@ -70,7 +70,7 @@ func TestValidateWith_DifferentTypes(t *testing.T) { // String type rule := &ValidationRule[string]{ - Predicate: func(s string) bool { return len(s) > 0 }, + Predicate: func(s string) bool { return s != "" }, Message: "string cannot be empty", } err := ValidateWith(ctx, "test", rule) @@ -111,22 +111,22 @@ func TestValidateAll(t *testing.T) { // All valid validator := &testValidator{shouldFail: false} values := []string{"test1", "test2", "test3"} - errors := ValidateAll(ctx, values, validator) - if len(errors) != 0 { - t.Errorf("ValidateAll should return no errors for all valid values, got %d errors", len(errors)) + errs := ValidateAll(ctx, values, validator) + if len(errs) != 0 { + t.Errorf("ValidateAll should return no errors for all valid values, got %d errors", len(errs)) } // Some invalid validator = &testValidator{shouldFail: true} - errors = ValidateAll(ctx, values, validator) - if len(errors) != len(values) { - t.Errorf("ValidateAll should return error for each invalid value, got %d errors", len(errors)) + errs = ValidateAll(ctx, values, validator) + if len(errs) != len(values) { + t.Errorf("ValidateAll should return error for each invalid value, got %d errors", len(errs)) } // Empty slice - errors = ValidateAll(ctx, []string{}, validator) - if len(errors) != 0 { - t.Errorf("ValidateAll should return no errors for empty slice, got %d errors", len(errors)) + errs = ValidateAll(ctx, []string{}, validator) + if len(errs) != 0 { + t.Errorf("ValidateAll should return no errors for empty slice, got %d errors", len(errs)) } } @@ -135,12 +135,12 @@ func TestValidateAll_NilValidator(t *testing.T) { ctx := context.Background() values := []string{"test1", "test2"} - errors := ValidateAll(ctx, values, nil) - if len(errors) != len(values) { - t.Errorf("ValidateAll should return error for each value when validator is nil, got %d errors", len(errors)) + errs := ValidateAll(ctx, values, nil) + if len(errs) != len(values) { + t.Errorf("ValidateAll should return error for each value when validator is nil, got %d errors", len(errs)) } expectedErrMsg := "[Validation] validator is nil" - for _, err := range errors { + for _, err := range errs { if err.Error() != expectedErrMsg { t.Errorf("ValidateAll should return specific error for nil validator, got %q, want %q", err.Error(), expectedErrMsg) } @@ -158,10 +158,10 @@ func TestValidateAll_MixedResults(t *testing.T) { } values := []string{"valid1", "invalid", "valid2", "invalid"} - errors := ValidateAll(ctx, values, validator) + errs := ValidateAll(ctx, values, validator) - if len(errors) != 2 { - t.Errorf("ValidateAll should return 2 errors, got %d", len(errors)) + if len(errs) != 2 { + t.Errorf("ValidateAll should return 2 errors, got %d", len(errs)) } } @@ -176,10 +176,10 @@ func TestValidateAll_DifferentTypes(t *testing.T) { } values := []int{1, 2, 3, -1, -2} - errors := ValidateAll(ctx, values, intRule) + errs := ValidateAll(ctx, values, intRule) - if len(errors) != 2 { - t.Errorf("ValidateAll should return 2 errors for invalid ints, got %d", len(errors)) + if len(errs) != 2 { + t.Errorf("ValidateAll should return 2 errors for invalid ints, got %d", len(errs)) } } @@ -191,7 +191,7 @@ func TestValidateAll_DifferentTypes(t *testing.T) { func TestComposeValidators(t *testing.T) { // Create two validators validator1 := &ValidationRule[string]{ - Predicate: func(s string) bool { return len(s) > 0 }, + Predicate: func(s string) bool { return s != "" }, Message: "string cannot be empty", } validator2 := &ValidationRule[string]{ @@ -242,7 +242,7 @@ func TestComposeValidators_Empty(t *testing.T) { func TestComposeValidators_MultipleTypes(t *testing.T) { // String validators strValidator1 := &ValidationRule[string]{ - Predicate: func(s string) bool { return len(s) > 0 }, + Predicate: func(s string) bool { return s != "" }, Message: "string cannot be empty", } strValidator2 := &ValidationRule[string]{ diff --git a/api/go/pkgerrors/pkgerrors_test.go b/api/go/pkgerrors/pkgerrors_test.go index 7247d062..0f5b2a4d 100644 --- a/api/go/pkgerrors/pkgerrors_test.go +++ b/api/go/pkgerrors/pkgerrors_test.go @@ -10,6 +10,9 @@ import ( "github.com/novus-engine/novuspack/api/go/internal/testhelpers" ) +const testErrMsg = "test error" +const testFieldVal = "test" + // TestErrorType_String tests the ErrorType String method. func TestErrorType_String(t *testing.T) { tests := []struct { @@ -90,19 +93,19 @@ func TestPackageError_Error(t *testing.T) { name: "error with cause", pkgErr: &PackageError{ Type: ErrTypeValidation, - Message: "test error", + Message: testErrMsg, Cause: errors.New("underlying error"), }, - contains: []string{"[Validation]", "test error", "underlying error"}, + contains: []string{"[Validation]", testErrMsg, "underlying error"}, }, { name: "error without cause", pkgErr: &PackageError{ Type: ErrTypeIO, - Message: "test error", + Message: testErrMsg, Cause: nil, }, - contains: []string{"[IO]", "test error"}, + contains: []string{"[IO]", testErrMsg}, }, { name: "error with context", @@ -143,7 +146,7 @@ func TestPackageError_Unwrap(t *testing.T) { name: "error with cause", pkgErr: &PackageError{ Type: ErrTypeValidation, - Message: "test error", + Message: testErrMsg, Cause: underlyingErr, }, expected: underlyingErr, @@ -152,7 +155,7 @@ func TestPackageError_Unwrap(t *testing.T) { name: "error without cause", pkgErr: &PackageError{ Type: ErrTypeIO, - Message: "test error", + Message: testErrMsg, Cause: nil, }, expected: nil, @@ -184,7 +187,7 @@ func TestPackageError_Is(t *testing.T) { name: "error with cause that matches target", pkgErr: &PackageError{ Type: ErrTypeValidation, - Message: "test error", + Message: testErrMsg, Cause: targetErr, }, target: targetErr, @@ -194,7 +197,7 @@ func TestPackageError_Is(t *testing.T) { name: "error with cause that doesn't match target", pkgErr: &PackageError{ Type: ErrTypeIO, - Message: "test error", + Message: testErrMsg, Cause: otherErr, }, target: targetErr, @@ -204,7 +207,7 @@ func TestPackageError_Is(t *testing.T) { name: "error without cause", pkgErr: &PackageError{ Type: ErrTypeSecurity, - Message: "test error", + Message: testErrMsg, Cause: nil, }, target: targetErr, @@ -214,7 +217,7 @@ func TestPackageError_Is(t *testing.T) { name: "error with nil target", pkgErr: &PackageError{ Type: ErrTypeValidation, - Message: "test error", + Message: testErrMsg, Cause: targetErr, }, target: nil, @@ -236,6 +239,12 @@ func TestPackageError_Is(t *testing.T) { func TestNewPackageError(t *testing.T) { underlyingErr := errors.New("underlying error") + validateWithCause := func(e *PackageError) bool { + return e.Type == ErrTypeValidation && e.Message == testErrMsg && e.Cause == underlyingErr && e.Context != nil + } + validateWithoutCause := func(e *PackageError) bool { + return e.Type == ErrTypeIO && e.Message == testErrMsg && e.Cause == nil && e.Context != nil + } tests := []struct { name string errType ErrorType @@ -243,48 +252,30 @@ func TestNewPackageError(t *testing.T) { cause error validate func(*PackageError) bool }{ - { - name: "with cause", - errType: ErrTypeValidation, - message: "test error", - cause: underlyingErr, - validate: func(e *PackageError) bool { - return e.Type == ErrTypeValidation && - e.Message == "test error" && - e.Cause == underlyingErr && - e.Context != nil - }, - }, - { - name: "without cause", - errType: ErrTypeIO, - message: "test error", - cause: nil, - validate: func(e *PackageError) bool { - return e.Type == ErrTypeIO && - e.Message == "test error" && - e.Cause == nil && - e.Context != nil - }, - }, + {"with cause", ErrTypeValidation, testErrMsg, underlyingErr, validateWithCause}, + {"without cause", ErrTypeIO, testErrMsg, nil, validateWithoutCause}, } for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - result := NewPackageError[struct{}](tt.errType, tt.message, tt.cause, struct{}{}) - if result == nil { - t.Fatal("NewPackageError() returned nil") - } - if !tt.validate(result) { - t.Errorf("NewPackageError() validation failed: %+v", result) - } - }) + runPackageErrorCase(t, tt.name, NewPackageError[struct{}](tt.errType, tt.message, tt.cause, struct{}{}), tt.validate) } } +func runPackageErrorCase(t *testing.T, name string, result *PackageError, validate func(*PackageError) bool) { + t.Helper() + t.Run(name, func(t *testing.T) { + if result == nil { + t.Fatal("result is nil") + } + if !validate(result) { + t.Errorf("validation failed: %+v", result) + } + }) +} + // TestPackageError_WithContext tests the WithContext method. func TestPackageError_WithContext(t *testing.T) { - pkgErr := NewPackageError[struct{}](ErrTypeValidation, "test error", nil, struct{}{}) + pkgErr := NewPackageError[struct{}](ErrTypeValidation, testErrMsg, nil, struct{}{}) result := pkgErr.WithContext("key1", "value1") if result != pkgErr { @@ -403,7 +394,7 @@ func TestIsPackageError(t *testing.T) { // TestAs tests the As function. func TestAs(t *testing.T) { - pkgErr := NewPackageError[struct{}](ErrTypeValidation, "test", nil, struct{}{}) + pkgErr := NewPackageError[struct{}](ErrTypeValidation, testFieldVal, nil, struct{}{}) standardErr := errors.New("standard error") tests := []struct { @@ -454,7 +445,7 @@ func TestAs(t *testing.T) { // TestGetErrorType tests the GetErrorType function. func TestGetErrorType(t *testing.T) { - pkgErr := NewPackageError[struct{}](ErrTypeSecurity, "test", nil, struct{}{}) + pkgErr := NewPackageError[struct{}](ErrTypeSecurity, testFieldVal, nil, struct{}{}) standardErr := errors.New("standard error") tests := []struct { @@ -498,7 +489,7 @@ func TestGetErrorType(t *testing.T) { // TestAddErrorContext tests the AddErrorContext generic function. func TestAddErrorContext(t *testing.T) { - pkgErr := NewPackageError[struct{}](ErrTypeValidation, "test", nil, struct{}{}) + pkgErr := NewPackageError[struct{}](ErrTypeValidation, testFieldVal, nil, struct{}{}) standardErr := errors.New("standard error") tests := []struct { @@ -570,7 +561,7 @@ func TestAddErrorContext(t *testing.T) { // TestGetErrorContext tests the GetErrorContext generic function. func TestGetErrorContext(t *testing.T) { - pkgErr := NewPackageError[struct{}](ErrTypeValidation, "test", nil, struct{}{}) + pkgErr := NewPackageError[struct{}](ErrTypeValidation, testFieldVal, nil, struct{}{}) pkgErr.Context["string_key"] = "string_value" pkgErr.Context["int_key"] = 42 pkgErr.Context["wrong_type"] = "not_an_int" @@ -660,9 +651,15 @@ func TestNewTypedPackageError(t *testing.T) { Field2 int } - ctx := TestContext{Field1: "test", Field2: 42} + ctx := TestContext{Field1: testFieldVal, Field2: 42} underlyingErr := errors.New("underlying") - + makeValidateTyped := func(wantType ErrorType, wantMsg string, wantCause error) func(*PackageError) bool { + return func(e *PackageError) bool { + typedCtx, ok := GetErrorContext[TestContext](e, "_typed_context") + return e.Type == wantType && e.Message == wantMsg && e.Cause == wantCause && ok && + typedCtx.Field1 == testFieldVal && typedCtx.Field2 == 42 + } + } tests := []struct { name string errType ErrorType @@ -671,50 +668,12 @@ func TestNewTypedPackageError(t *testing.T) { context TestContext validate func(*PackageError) bool }{ - { - name: "with cause", - errType: ErrTypeValidation, - message: "test error", - cause: underlyingErr, - context: ctx, - validate: func(e *PackageError) bool { - typedCtx, ok := GetErrorContext[TestContext](e, "_typed_context") - return e.Type == ErrTypeValidation && - e.Message == "test error" && - e.Cause == underlyingErr && - ok && - typedCtx.Field1 == "test" && - typedCtx.Field2 == 42 - }, - }, - { - name: "without cause", - errType: ErrTypeIO, - message: "test error", - cause: nil, - context: ctx, - validate: func(e *PackageError) bool { - typedCtx, ok := GetErrorContext[TestContext](e, "_typed_context") - return e.Type == ErrTypeIO && - e.Message == "test error" && - e.Cause == nil && - ok && - typedCtx.Field1 == "test" && - typedCtx.Field2 == 42 - }, - }, + {"with cause", ErrTypeValidation, testErrMsg, underlyingErr, ctx, makeValidateTyped(ErrTypeValidation, testErrMsg, underlyingErr)}, + {"without cause", ErrTypeIO, testErrMsg, nil, ctx, makeValidateTyped(ErrTypeIO, testErrMsg, nil)}, } for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { - result := NewTypedPackageError(tt.errType, tt.message, tt.cause, tt.context) - if result == nil { - t.Fatal("NewTypedPackageError() returned nil") - } - if !tt.validate(result) { - t.Errorf("NewTypedPackageError() validation failed: %+v", result) - } - }) + runPackageErrorCase(t, tt.name, NewTypedPackageError(tt.errType, tt.message, tt.cause, tt.context), tt.validate) } } @@ -726,8 +685,13 @@ func TestWrapErrorWithContext(t *testing.T) { standardErr := errors.New("standard error") pkgErr := NewPackageError[struct{}](ErrTypeValidation, "original", nil, struct{}{}) - ctx := TestContext{Value: "test"} - + ctx := TestContext{Value: testFieldVal} + validateWrapCtx := func(wantType ErrorType, wantMsg string, wantCause error) func(*PackageError) bool { + return func(e *PackageError) bool { + typedCtx, ok := GetErrorContext[TestContext](e, "_typed_context") + return e.Type == wantType && e.Message == wantMsg && e.Cause == wantCause && ok && typedCtx.Value == testFieldVal + } + } tests := []struct { name string err error @@ -736,50 +700,12 @@ func TestWrapErrorWithContext(t *testing.T) { context TestContext validate func(*PackageError) bool }{ - { - name: "wrap standard error", - err: standardErr, - errType: ErrTypeIO, - message: "wrapped", - context: ctx, - validate: func(e *PackageError) bool { - typedCtx, ok := GetErrorContext[TestContext](e, "_typed_context") - return e.Type == ErrTypeIO && - e.Message == "wrapped" && - e.Cause == standardErr && - ok && - typedCtx.Value == "test" - }, - }, - { - name: "wrap PackageError", - err: pkgErr, - errType: ErrTypeSecurity, - message: "updated", - context: ctx, - validate: func(e *PackageError) bool { - typedCtx, ok := GetErrorContext[TestContext](e, "_typed_context") - return e.Type == ErrTypeSecurity && - e.Message == "updated" && - ok && - typedCtx.Value == "test" - }, - }, - { - name: "wrap nil error", - err: nil, - errType: ErrTypeValidation, - message: "new error", - context: ctx, - validate: func(e *PackageError) bool { - typedCtx, ok := GetErrorContext[TestContext](e, "_typed_context") - return e.Type == ErrTypeValidation && - e.Message == "new error" && - e.Cause == nil && - ok && - typedCtx.Value == "test" - }, - }, + {"wrap standard error", standardErr, ErrTypeIO, "wrapped", ctx, validateWrapCtx(ErrTypeIO, "wrapped", standardErr)}, + {"wrap PackageError", pkgErr, ErrTypeSecurity, "updated", ctx, func(e *PackageError) bool { + typedCtx, ok := GetErrorContext[TestContext](e, "_typed_context") + return e.Type == ErrTypeSecurity && e.Message == "updated" && ok && typedCtx.Value == testFieldVal + }}, + {"wrap nil error", nil, ErrTypeValidation, "new error", ctx, validateWrapCtx(ErrTypeValidation, "new error", nil)}, } for _, tt := range tests { @@ -805,7 +731,7 @@ func TestMapError(t *testing.T) { } sourceCtx := SourceContext{Value: 42} - pkgErr := NewTypedPackageError(ErrTypeValidation, "test", nil, sourceCtx) + pkgErr := NewTypedPackageError(ErrTypeValidation, testFieldVal, nil, sourceCtx) standardErr := errors.New("standard error") tests := []struct { diff --git a/api/go/signatures/signature.go b/api/go/signatures/signature.go index bbacad7d..4544c63a 100644 --- a/api/go/signatures/signature.go +++ b/api/go/signatures/signature.go @@ -4,7 +4,7 @@ // related to signature operations as specified in api_signatures.md Section 1 // and Section 2. // -// Specification: api_signatures.md: 1. Signature Management +// Specification: api_signatures.md: 4.1.4.1 Signature Struct // Package novuspack provides signatures domain structures for the NovusPack implementation. // @@ -27,31 +27,31 @@ import ( // Specification: package_file_format.md: 8.1 Signature Structure type Signature struct { // SignatureType is the signature algorithm identifier - // Specification: package_file_format.md: 1. `.nvpk` File Format Overview + // Specification: package_file_format.md: 8.1 Signature Structure SignatureType uint32 // SignatureSize is the size of signature data in bytes - // Specification: package_file_format.md: 7.1 Package Comment Format Specification + // Specification: package_file_format.md: 8.1 Signature Structure SignatureSize uint32 // SignatureFlags contains signature-specific metadata - // Specification: package_file_format.md: 1. `.nvpk` File Format Overview + // Specification: package_file_format.md: 8.1 Signature Structure SignatureFlags uint32 // SignatureTimestamp is the signature creation time (Unix nanoseconds) - // Specification: package_file_format.md: 1. `.nvpk` File Format Overview + // Specification: package_file_format.md: 8.1 Signature Structure SignatureTimestamp uint32 // CommentLength is the length of signature comment - // Specification: package_file_format.md: 1. `.nvpk` File Format Overview + // Specification: package_file_format.md: 8.1 Signature Structure CommentLength uint16 // SignatureComment is a human-readable comment about the signature - // Specification: package_file_format.md: 7.1 Package Comment Format Specification + // Specification: package_file_format.md: 8.1 Signature Structure SignatureComment string // SignatureData contains the raw signature bytes - // Specification: package_file_format.md: 7.1 Package Comment Format Specification + // Specification: package_file_format.md: 8.1 Signature Structure SignatureData []byte } @@ -64,7 +64,7 @@ type Signature struct { // - SignatureData must not be nil or empty // // Returns an error if any validation check fails. -func (s *Signature) Validate() error { +func (s *Signature) validate() error { if s.SignatureType == 0 { return pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "signature type cannot be zero", nil, pkgerrors.ValidationErrorContext{ Field: "SignatureType", @@ -111,7 +111,7 @@ func (s *Signature) Validate() error { // Size returns the total size of the Signature in bytes. // // Specification: package_file_format.md: 7.1 Package Comment Format Specification -func (s *Signature) Size() int { +func (s *Signature) size() int { // Type(4) + Size(4) + Flags(4) + Timestamp(4) + CommentLength(2) + Comment + Data return 18 + int(s.CommentLength) + int(s.SignatureSize) } @@ -136,118 +136,104 @@ func NewSignature() *Signature { // - SignatureComment (CommentLength bytes, UTF-8 string with null terminator) // - SignatureData (SignatureSize bytes) // -// Returns the number of bytes read and any error encountered. -// -// Specification: package_file_format.md: 8.1 Signature Structure -func (s *Signature) ReadFrom(r io.Reader) (int64, error) { - var totalRead int64 - - // Read SignatureType (4 bytes) +// readSignatureHeader reads the 18-byte fixed header into s; returns bytes read and error. +func readSignatureHeader(r io.Reader, s *Signature) (int64, error) { if err := binary.Read(r, binary.LittleEndian, &s.SignatureType); err != nil { if err == io.EOF || err == io.ErrUnexpectedEOF { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to read signature type: incomplete data", pkgerrors.ValidationErrorContext{ - Field: "SignatureType", - Value: totalRead, - Expected: "4 bytes", + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeCorruption, "failed to read signature type: incomplete data", pkgerrors.ValidationErrorContext{ + Field: "SignatureType", Value: int64(0), Expected: "4 bytes", }) } - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature type", pkgerrors.ValidationErrorContext{ - Field: "SignatureType", - Value: nil, - Expected: "4 bytes", + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature type", pkgerrors.ValidationErrorContext{ + Field: "SignatureType", Value: nil, Expected: "4 bytes", }) } - totalRead += 4 - - // Read SignatureSize (4 bytes) if err := binary.Read(r, binary.LittleEndian, &s.SignatureSize); err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature size", pkgerrors.ValidationErrorContext{ - Field: "SignatureSize", - Value: nil, - Expected: "4 bytes", + return 4, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature size", pkgerrors.ValidationErrorContext{ + Field: "SignatureSize", Value: nil, Expected: "4 bytes", }) } - totalRead += 4 - - // Read SignatureFlags (4 bytes) if err := binary.Read(r, binary.LittleEndian, &s.SignatureFlags); err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature flags", pkgerrors.ValidationErrorContext{ - Field: "SignatureFlags", - Value: nil, - Expected: "4 bytes", + return 8, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature flags", pkgerrors.ValidationErrorContext{ + Field: "SignatureFlags", Value: nil, Expected: "4 bytes", }) } - totalRead += 4 - - // Read SignatureTimestamp (4 bytes) if err := binary.Read(r, binary.LittleEndian, &s.SignatureTimestamp); err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature timestamp", pkgerrors.ValidationErrorContext{ - Field: "SignatureTimestamp", - Value: nil, - Expected: "4 bytes", + return 12, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature timestamp", pkgerrors.ValidationErrorContext{ + Field: "SignatureTimestamp", Value: nil, Expected: "4 bytes", }) } - totalRead += 4 - - // Read CommentLength (2 bytes) if err := binary.Read(r, binary.LittleEndian, &s.CommentLength); err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read comment length", pkgerrors.ValidationErrorContext{ - Field: "CommentLength", - Value: nil, - Expected: "2 bytes", + return 16, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read comment length", pkgerrors.ValidationErrorContext{ + Field: "CommentLength", Value: nil, Expected: "2 bytes", }) } - totalRead += 2 + return 18, nil +} - // Read SignatureComment (CommentLength bytes) - if s.CommentLength > 0 { - commentBytes := make([]byte, s.CommentLength) - n, err := io.ReadFull(r, commentBytes) - if err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature comment", pkgerrors.ValidationErrorContext{ - Field: "SignatureComment", - Value: s.CommentLength, - Expected: "comment data", - }) - } - if uint16(n) != s.CommentLength { - return totalRead, pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "incomplete comment read", nil, pkgerrors.ValidationErrorContext{ - Field: "SignatureComment", - Value: n, - Expected: fmt.Sprintf("%d bytes", s.CommentLength), - }) - } - totalRead += int64(n) - s.SignatureComment = string(commentBytes) - } else { +// readSignatureComment reads CommentLength bytes into s.SignatureComment; returns bytes read and error. +func readSignatureComment(r io.Reader, s *Signature) (int64, error) { + if s.CommentLength == 0 { s.SignatureComment = "" + return 0, nil + } + commentBytes := make([]byte, s.CommentLength) + n, err := io.ReadFull(r, commentBytes) + if err != nil { + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature comment", pkgerrors.ValidationErrorContext{ + Field: "SignatureComment", Value: s.CommentLength, Expected: "comment data", + }) + } + if uint16(n) != s.CommentLength { + return int64(n), pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "incomplete comment read", nil, pkgerrors.ValidationErrorContext{ + Field: "SignatureComment", Value: n, Expected: fmt.Sprintf("%d bytes", s.CommentLength), + }) } + s.SignatureComment = string(commentBytes) + return int64(n), nil +} - // Read SignatureData (SignatureSize bytes) - if s.SignatureSize > 0 { - signatureData := make([]byte, s.SignatureSize) - n, err := io.ReadFull(r, signatureData) - if err != nil { - return totalRead, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature data", pkgerrors.ValidationErrorContext{ - Field: "SignatureData", - Value: s.SignatureSize, - Expected: "signature data", - }) - } - if uint32(n) != s.SignatureSize { - return totalRead, pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "incomplete signature data read", nil, pkgerrors.ValidationErrorContext{ - Field: "SignatureData", - Value: n, - Expected: fmt.Sprintf("%d bytes", s.SignatureSize), - }) - } - totalRead += int64(n) - s.SignatureData = signatureData - } else { +// readSignatureData reads SignatureSize bytes into s.SignatureData; returns bytes read and error. +func readSignatureData(r io.Reader, s *Signature) (int64, error) { + if s.SignatureSize == 0 { s.SignatureData = nil + return 0, nil + } + signatureData := make([]byte, s.SignatureSize) + n, err := io.ReadFull(r, signatureData) + if err != nil { + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to read signature data", pkgerrors.ValidationErrorContext{ + Field: "SignatureData", Value: s.SignatureSize, Expected: "signature data", + }) + } + if uint32(n) != s.SignatureSize { + return int64(n), pkgerrors.NewPackageError(pkgerrors.ErrTypeCorruption, "incomplete signature data read", nil, pkgerrors.ValidationErrorContext{ + Field: "SignatureData", Value: n, Expected: fmt.Sprintf("%d bytes", s.SignatureSize), + }) } + s.SignatureData = signatureData + return int64(n), nil +} - return totalRead, nil +// Returns the number of bytes read and any error encountered. +// +// Specification: package_file_format.md: 8.1 Signature Structure +func (s *Signature) readFrom(r io.Reader) (int64, error) { + n, err := readSignatureHeader(r, s) + if err != nil { + return n, err + } + totalRead := n + n, err = readSignatureComment(r, s) + if err != nil { + return totalRead, err + } + totalRead += n + n, err = readSignatureData(r, s) + if err != nil { + return totalRead, err + } + return totalRead + n, nil } // WriteTo writes a Signature to the provided io.Writer. @@ -266,139 +252,123 @@ func (s *Signature) ReadFrom(r io.Reader) (int64, error) { // // Returns the number of bytes written and any error encountered. // -// Specification: package_file_format.md: 8.1 Signature Structure -func (s *Signature) WriteTo(w io.Writer) (int64, error) { - var totalWritten int64 - - // Update CommentLength and SignatureSize to match actual data - s.CommentLength = uint16(len(s.SignatureComment)) - s.SignatureSize = uint32(len(s.SignatureData)) - - // Write SignatureType (4 bytes) +// writeSignatureHeader writes the 18-byte fixed header; s.CommentLength and s.SignatureSize must be set. +func writeSignatureHeader(w io.Writer, s *Signature) (int64, error) { if err := binary.Write(w, binary.LittleEndian, s.SignatureType); err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature type", pkgerrors.ValidationErrorContext{ - Field: "SignatureType", - Value: s.SignatureType, - Expected: "written successfully", + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature type", pkgerrors.ValidationErrorContext{ + Field: "SignatureType", Value: s.SignatureType, Expected: "written successfully", }) } - totalWritten += 4 - - // Write SignatureSize (4 bytes) if err := binary.Write(w, binary.LittleEndian, s.SignatureSize); err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature size", pkgerrors.ValidationErrorContext{ - Field: "SignatureSize", - Value: s.SignatureSize, - Expected: "written successfully", + return 4, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature size", pkgerrors.ValidationErrorContext{ + Field: "SignatureSize", Value: s.SignatureSize, Expected: "written successfully", }) } - totalWritten += 4 - - // Write SignatureFlags (4 bytes) if err := binary.Write(w, binary.LittleEndian, s.SignatureFlags); err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature flags", pkgerrors.ValidationErrorContext{ - Field: "SignatureFlags", - Value: s.SignatureFlags, - Expected: "written successfully", + return 8, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature flags", pkgerrors.ValidationErrorContext{ + Field: "SignatureFlags", Value: s.SignatureFlags, Expected: "written successfully", }) } - totalWritten += 4 - - // Write SignatureTimestamp (4 bytes) if err := binary.Write(w, binary.LittleEndian, s.SignatureTimestamp); err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature timestamp", pkgerrors.ValidationErrorContext{ - Field: "SignatureTimestamp", - Value: s.SignatureTimestamp, - Expected: "written successfully", + return 12, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature timestamp", pkgerrors.ValidationErrorContext{ + Field: "SignatureTimestamp", Value: s.SignatureTimestamp, Expected: "written successfully", }) } - totalWritten += 4 - - // Write CommentLength (2 bytes) if err := binary.Write(w, binary.LittleEndian, s.CommentLength); err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write comment length", pkgerrors.ValidationErrorContext{ - Field: "CommentLength", - Value: s.CommentLength, - Expected: "written successfully", + return 16, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write comment length", pkgerrors.ValidationErrorContext{ + Field: "CommentLength", Value: s.CommentLength, Expected: "written successfully", }) } - totalWritten += 2 + return 18, nil +} - // Write SignatureComment (CommentLength bytes) - if s.CommentLength > 0 { - commentBytes := []byte(s.SignatureComment) - if uint16(len(commentBytes)) != s.CommentLength { - return totalWritten, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "comment length mismatch", nil, pkgerrors.ValidationErrorContext{ - Field: "CommentLength", - Value: len(commentBytes), - Expected: fmt.Sprintf("%d", s.CommentLength), - }) - } - n, err := w.Write(commentBytes) - if err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature comment", pkgerrors.ValidationErrorContext{ - Field: "SignatureComment", - Value: s.SignatureComment, - Expected: "written successfully", - }) - } - if uint16(n) != s.CommentLength { - return totalWritten, pkgerrors.NewPackageError(pkgerrors.ErrTypeIO, "incomplete comment write", nil, pkgerrors.ValidationErrorContext{ - Field: "SignatureComment", - Value: n, - Expected: fmt.Sprintf("%d bytes", s.CommentLength), - }) - } - totalWritten += int64(n) +// writeSignatureComment writes s.SignatureComment (s.CommentLength bytes); returns bytes written and error. +func writeSignatureComment(w io.Writer, s *Signature) (int64, error) { + if s.CommentLength == 0 { + return 0, nil + } + commentBytes := []byte(s.SignatureComment) + if uint16(len(commentBytes)) != s.CommentLength { + return 0, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "comment length mismatch", nil, pkgerrors.ValidationErrorContext{ + Field: "CommentLength", Value: len(commentBytes), Expected: fmt.Sprintf("%d", s.CommentLength), + }) + } + n, err := w.Write(commentBytes) + if err != nil { + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature comment", pkgerrors.ValidationErrorContext{ + Field: "SignatureComment", Value: s.SignatureComment, Expected: "written successfully", + }) } + if uint16(n) != s.CommentLength { + return int64(n), pkgerrors.NewPackageError(pkgerrors.ErrTypeIO, "incomplete comment write", nil, pkgerrors.ValidationErrorContext{ + Field: "SignatureComment", Value: n, Expected: fmt.Sprintf("%d bytes", s.CommentLength), + }) + } + return int64(n), nil +} - // Write SignatureData (SignatureSize bytes) - if s.SignatureSize > 0 { - if uint32(len(s.SignatureData)) != s.SignatureSize { - return totalWritten, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "signature size mismatch", nil, pkgerrors.ValidationErrorContext{ - Field: "SignatureSize", - Value: len(s.SignatureData), - Expected: fmt.Sprintf("%d", s.SignatureSize), - }) - } - n, err := w.Write(s.SignatureData) - if err != nil { - return totalWritten, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature data", pkgerrors.ValidationErrorContext{ - Field: "SignatureData", - Value: s.SignatureData, - Expected: "written successfully", - }) - } - if uint32(n) != s.SignatureSize { - return totalWritten, pkgerrors.NewPackageError(pkgerrors.ErrTypeIO, "incomplete signature data write", nil, pkgerrors.ValidationErrorContext{ - Field: "SignatureData", - Value: n, - Expected: fmt.Sprintf("%d bytes", s.SignatureSize), - }) - } - totalWritten += int64(n) +// writeSignatureData writes s.SignatureData (s.SignatureSize bytes); returns bytes written and error. +func writeSignatureData(w io.Writer, s *Signature) (int64, error) { + if s.SignatureSize == 0 { + return 0, nil + } + if uint32(len(s.SignatureData)) != s.SignatureSize { + return 0, pkgerrors.NewPackageError(pkgerrors.ErrTypeValidation, "signature size mismatch", nil, pkgerrors.ValidationErrorContext{ + Field: "SignatureSize", Value: len(s.SignatureData), Expected: fmt.Sprintf("%d", s.SignatureSize), + }) } + n, err := w.Write(s.SignatureData) + if err != nil { + return 0, pkgerrors.WrapErrorWithContext(err, pkgerrors.ErrTypeIO, "failed to write signature data", pkgerrors.ValidationErrorContext{ + Field: "SignatureData", Value: s.SignatureData, Expected: "written successfully", + }) + } + if uint32(n) != s.SignatureSize { + return int64(n), pkgerrors.NewPackageError(pkgerrors.ErrTypeIO, "incomplete signature data write", nil, pkgerrors.ValidationErrorContext{ + Field: "SignatureData", Value: n, Expected: fmt.Sprintf("%d bytes", s.SignatureSize), + }) + } + return int64(n), nil +} - return totalWritten, nil +// Specification: package_file_format.md: 8.1 Signature Structure +func (s *Signature) writeTo(w io.Writer) (int64, error) { + s.CommentLength = uint16(len(s.SignatureComment)) + s.SignatureSize = uint32(len(s.SignatureData)) + n, err := writeSignatureHeader(w, s) + if err != nil { + return n, err + } + totalWritten := n + n, err = writeSignatureComment(w, s) + if err != nil { + return totalWritten, err + } + totalWritten += n + n, err = writeSignatureData(w, s) + if err != nil { + return totalWritten, err + } + return totalWritten + n, nil } // HasFlag checks if a specific signature flag is set. // // Specification: package_file_format.md: 8.2.2 SignatureFlags Field -func (s *Signature) HasFlag(flag uint32) bool { +func (s *Signature) hasFlag(flag uint32) bool { return (s.SignatureFlags & flag) != 0 } // SetFlag sets a specific signature flag. // // Specification: package_file_format.md: 8.2.2 SignatureFlags Field -func (s *Signature) SetFlag(flag uint32) { +func (s *Signature) setFlag(flag uint32) { s.SignatureFlags |= flag } // ClearFlag clears a specific signature flag. // // Specification: package_file_format.md: 8.2.2 SignatureFlags Field -func (s *Signature) ClearFlag(flag uint32) { +func (s *Signature) clearFlag(flag uint32) { s.SignatureFlags &= ^flag } diff --git a/api/go/signatures/signature_test.go b/api/go/signatures/signature_test.go index d9b36df0..1331c0e1 100644 --- a/api/go/signatures/signature_test.go +++ b/api/go/signatures/signature_test.go @@ -11,6 +11,47 @@ import ( "github.com/novus-engine/novuspack/api/go/internal/testhelpers" ) +const sigTestComment = "test comment" + +// signatureHeaderThenErrorReader returns a reader that yields the fixed signature header, lastU16, then an error. +func signatureHeaderThenErrorReader(lastU16 uint16) io.Reader { + buf := new(bytes.Buffer) + _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType + _ = binary.Write(buf, binary.LittleEndian, uint32(64)) // SignatureSize + _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags + _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp + _ = binary.Write(buf, binary.LittleEndian, lastU16) + return io.MultiReader(buf, testhelpers.NewErrorReader()) +} + +// signatureHeaderBytes builds the 18-byte signature header (type, size, flags, timestamp, commentLen). +func signatureHeaderBytes(sigType, sigSize, flags, ts uint32, commentLen uint16) []byte { + buf := new(bytes.Buffer) + _ = binary.Write(buf, binary.LittleEndian, sigType) + _ = binary.Write(buf, binary.LittleEndian, sigSize) + _ = binary.Write(buf, binary.LittleEndian, flags) + _ = binary.Write(buf, binary.LittleEndian, ts) + _ = binary.Write(buf, binary.LittleEndian, commentLen) + return buf.Bytes() +} + +// signatureHeaderWithCommentAndPartialData returns header (type 1, size 64, 0, 0, 10) + full comment + partialData bytes. +func signatureHeaderWithCommentAndPartialData(partialDataLen int) []byte { + buf := new(bytes.Buffer) + buf.Write(signatureHeaderBytes(1, 64, 0, 0, 10)) + buf.WriteString(sigTestComment[:10]) + buf.Write(make([]byte, partialDataLen)) + return buf.Bytes() +} + +// signatureHeaderWithCommentBytes returns header (type 1, size 64, 0, 0, 10) + comment[:n]. +func signatureHeaderWithCommentBytes(n int) []byte { + buf := new(bytes.Buffer) + buf.Write(signatureHeaderBytes(1, 64, 0, 0, 10)) + buf.WriteString(sigTestComment[:n]) + return buf.Bytes() +} + // TestSignatureValidation verifies validation logic func TestSignatureValidation(t *testing.T) { tests := []struct { @@ -103,9 +144,9 @@ func TestSignatureValidation(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { - err := tt.signature.Validate() + err := tt.signature.validate() if (err != nil) != tt.wantErr { - t.Errorf("Validate() error = %v, wantErr %v", err, tt.wantErr) + t.Errorf("validate() error = %v, wantErr %v", err, tt.wantErr) } }) } @@ -131,8 +172,8 @@ func TestSignatureSizeCalculation(t *testing.T) { CommentLength: tt.commentLength, } - if sig.Size() != tt.wantSize { - t.Errorf("Size() = %d, want %d", sig.Size(), tt.wantSize) + if sig.size() != tt.wantSize { + t.Errorf("size() = %d, want %d", sig.size(), tt.wantSize) } }) } @@ -143,14 +184,14 @@ func TestSignatureFlags(t *testing.T) { sig := Signature{} // Test setting flag - sig.SetFlag(0x01) - if !sig.HasFlag(0x01) { + sig.setFlag(0x01) + if !sig.hasFlag(0x01) { t.Error("Expected flag 0x01 to be set") } // Test clearing flag - sig.ClearFlag(0x01) - if sig.HasFlag(0x01) { + sig.clearFlag(0x01) + if sig.hasFlag(0x01) { t.Error("Expected flag 0x01 to be cleared") } } @@ -188,6 +229,8 @@ func TestNewSignature(t *testing.T) { // TestSignatureReadFrom verifies ReadFrom deserialization // Specification: package_file_format.md: 8.1 Signature Structure +// +//nolint:gocognit // table-driven read cases func TestSignatureReadFrom(t *testing.T) { tests := []struct { name string @@ -230,14 +273,14 @@ func TestSignatureReadFrom(t *testing.T) { // Serialize using WriteTo var writeBuf bytes.Buffer - _, writeErr := tt.sig.WriteTo(&writeBuf) + _, writeErr := tt.sig.writeTo(&writeBuf) if writeErr != nil { t.Fatalf("WriteTo() error = %v", writeErr) } // Deserialize using ReadFrom var sig Signature - n, err := sig.ReadFrom(&writeBuf) + n, err := sig.readFrom(&writeBuf) if (err != nil) != tt.wantErr { t.Errorf("ReadFrom() error = %v, wantErr %v", err, tt.wantErr) @@ -245,7 +288,7 @@ func TestSignatureReadFrom(t *testing.T) { } if !tt.wantErr { - expectedSize := tt.sig.Size() + expectedSize := tt.sig.size() if n != int64(expectedSize) { t.Errorf("ReadFrom() read %d bytes, want %d", n, expectedSize) } @@ -268,8 +311,8 @@ func TestSignatureReadFrom(t *testing.T) { } // Verify validation passes - if err := sig.Validate(); err != nil { - t.Errorf("ReadFrom() signature validation failed: %v", err) + if err := sig.validate(); err != nil { + t.Errorf("readFrom() signature validation failed: %v", err) } // Verify SignatureFlags and SignatureTimestamp match @@ -285,6 +328,8 @@ func TestSignatureReadFrom(t *testing.T) { } // TestSignatureReadFromIncompleteData verifies ReadFrom handles incomplete data +// +//nolint:gocognit // table-driven incomplete cases func TestSignatureReadFromIncompleteData(t *testing.T) { tests := []struct { name string @@ -293,15 +338,7 @@ func TestSignatureReadFromIncompleteData(t *testing.T) { {"No data", []byte{}}, {"Partial header", make([]byte, 8)}, {"Almost complete header", make([]byte, 17)}, - {"Header but no data", func() []byte { - buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType - _ = binary.Write(buf, binary.LittleEndian, uint32(64)) // SignatureSize - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp - _ = binary.Write(buf, binary.LittleEndian, uint16(0)) // CommentLength - return buf.Bytes() // Only 18 bytes, but SignatureSize says 64 bytes needed - }()}, + {"Header but no data", signatureHeaderBytes(1, 64, 0, 0, 0)}, // Only 18 bytes, SignatureSize says 64 needed {"Header with comment but incomplete comment", func() []byte { buf := new(bytes.Buffer) _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType @@ -312,55 +349,10 @@ func TestSignatureReadFromIncompleteData(t *testing.T) { buf.WriteString("test") // Only 4 bytes of 10 return buf.Bytes() }()}, - {"Header with comment but incomplete signature data", func() []byte { - buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType - _ = binary.Write(buf, binary.LittleEndian, uint32(64)) // SignatureSize - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp - _ = binary.Write(buf, binary.LittleEndian, uint16(10)) // CommentLength - comment := "test comment" - buf.Write([]byte(comment[:10])) // Exactly 10 bytes - partialData := make([]byte, 30) // Only 30 bytes of 64 signature data - buf.Write(partialData) - return buf.Bytes() - }()}, - {"Header with comment but incomplete signature data (exact boundary)", func() []byte { - buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType - _ = binary.Write(buf, binary.LittleEndian, uint32(64)) // SignatureSize - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp - _ = binary.Write(buf, binary.LittleEndian, uint16(10)) // CommentLength - comment := "test comment" - buf.Write([]byte(comment[:10])) // Exactly 10 bytes - partialData := make([]byte, 63) // Only 63 bytes of 64 signature data (exact boundary) - buf.Write(partialData) - return buf.Bytes() - }()}, - {"Header with comment but incomplete comment (exact boundary)", func() []byte { - buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType - _ = binary.Write(buf, binary.LittleEndian, uint32(64)) // SignatureSize - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp - _ = binary.Write(buf, binary.LittleEndian, uint16(10)) // CommentLength - comment := "test comment" - buf.Write([]byte(comment[:9])) // Only 9 bytes of 10 (exact boundary) - return buf.Bytes() - }()}, - {"Header with comment but no signature data when SignatureSize > 0", func() []byte { - buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType - _ = binary.Write(buf, binary.LittleEndian, uint32(64)) // SignatureSize - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp - _ = binary.Write(buf, binary.LittleEndian, uint16(10)) // CommentLength - comment := "test comment" - buf.Write([]byte(comment[:10])) // Exactly 10 bytes - // No signature data - return buf.Bytes() - }()}, + {"Header with comment but incomplete signature data", signatureHeaderWithCommentAndPartialData(30)}, + {"Header with comment but incomplete signature data (exact boundary)", signatureHeaderWithCommentAndPartialData(63)}, + {"Header with comment but incomplete comment (exact boundary)", signatureHeaderWithCommentBytes(9)}, + {"Header with comment but no signature data when SignatureSize > 0", signatureHeaderWithCommentBytes(10)}, {"Incomplete SignatureSize read", func() []byte { buf := new(bytes.Buffer) _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType @@ -395,29 +387,20 @@ func TestSignatureReadFromIncompleteData(t *testing.T) { buf.Write([]byte{0x00}) return buf.Bytes() }()}, - {"Valid signature with zero SignatureSize and zero CommentLength", func() []byte { - buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureSize = 0 - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp - _ = binary.Write(buf, binary.LittleEndian, uint16(0)) // CommentLength = 0 - // Complete header, no data (valid case) - return buf.Bytes() - }()}, + {"Valid signature with zero SignatureSize and zero CommentLength", signatureHeaderBytes(1, 0, 0, 0, 0)}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { var sig Signature r := bytes.NewReader(tt.data) - _, err := sig.ReadFrom(r) + _, err := sig.readFrom(r) // Check if this is a valid case (zero sizes) isValidZeroCase := strings.Contains(tt.name, "Valid signature with zero") if isValidZeroCase { if err != nil { - t.Errorf("ReadFrom() expected success for valid zero-size signature, got error: %v", err) + t.Errorf("readFrom() expected success for valid zero-size signature, got error: %v", err) } // Verify the signature was read correctly if sig.SignatureType != 1 { @@ -429,16 +412,16 @@ func TestSignatureReadFromIncompleteData(t *testing.T) { if sig.CommentLength != 0 { t.Errorf("CommentLength = %d, want 0", sig.CommentLength) } - } else { - if err == nil { - t.Errorf("ReadFrom() expected error for incomplete data, got nil") - } + } else if err == nil { + t.Errorf("readFrom() expected error for incomplete data, got nil") } }) } } // TestSignatureReadFromNonEOFErrors verifies ReadFrom handles non-EOF errors +// +//nolint:gocognit // table-driven non-EOF error cases func TestSignatureReadFromNonEOFErrors(t *testing.T) { tests := []struct { name string @@ -501,41 +484,17 @@ func TestSignatureReadFromNonEOFErrors(t *testing.T) { }(), true, }, - { - "Error reader during comment read", - func() io.Reader { - buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType - _ = binary.Write(buf, binary.LittleEndian, uint32(64)) // SignatureSize - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp - _ = binary.Write(buf, binary.LittleEndian, uint16(10)) // CommentLength = 10 - return io.MultiReader(buf, testhelpers.NewErrorReader()) - }(), - true, - }, - { - "Error reader during signature data read", - func() io.Reader { - buf := new(bytes.Buffer) - _ = binary.Write(buf, binary.LittleEndian, uint32(1)) // SignatureType - _ = binary.Write(buf, binary.LittleEndian, uint32(64)) // SignatureSize - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureFlags - _ = binary.Write(buf, binary.LittleEndian, uint32(0)) // SignatureTimestamp - _ = binary.Write(buf, binary.LittleEndian, uint16(0)) // CommentLength = 0 - return io.MultiReader(buf, testhelpers.NewErrorReader()) - }(), - true, - }, + {"Error reader during comment read", signatureHeaderThenErrorReader(10), true}, + {"Error reader during signature data read", signatureHeaderThenErrorReader(0), true}, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { var sig Signature - _, err := sig.ReadFrom(tt.reader) + _, err := sig.readFrom(tt.reader) if (err != nil) != tt.wantErr { - t.Errorf("ReadFrom() error = %v, wantErr %v", err, tt.wantErr) + t.Errorf("readFrom() error = %v, wantErr %v", err, tt.wantErr) return } @@ -544,7 +503,7 @@ func TestSignatureReadFromNonEOFErrors(t *testing.T) { if strings.Contains(tt.name, "Error reader") { errStr := err.Error() if strings.Contains(errStr, "EOF") || strings.Contains(errStr, "incomplete") { - t.Errorf("ReadFrom() error = %q, want non-EOF error for error reader", errStr) + t.Errorf("readFrom() error = %q, want non-EOF error for error reader", errStr) } } } @@ -554,6 +513,8 @@ func TestSignatureReadFromNonEOFErrors(t *testing.T) { // TestSignatureWriteTo verifies WriteTo serialization // Specification: package_file_format.md: 8.1 Signature Structure +// +//nolint:gocognit // table-driven write cases func TestSignatureWriteTo(t *testing.T) { tests := []struct { name string @@ -634,7 +595,7 @@ func TestSignatureWriteTo(t *testing.T) { tt.sig.SignatureSize = uint32(len(tt.sig.SignatureData)) var buf bytes.Buffer - n, err := tt.sig.WriteTo(&buf) + n, err := tt.sig.writeTo(&buf) if (err != nil) != tt.wantErr { t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) @@ -642,7 +603,7 @@ func TestSignatureWriteTo(t *testing.T) { } if !tt.wantErr { - expectedSize := tt.sig.Size() + expectedSize := tt.sig.size() if n != int64(expectedSize) { t.Errorf("WriteTo() wrote %d bytes, want %d", n, expectedSize) } @@ -653,7 +614,7 @@ func TestSignatureWriteTo(t *testing.T) { // Verify we can read it back var sig Signature - _, readErr := sig.ReadFrom(&buf) + _, readErr := sig.readFrom(&buf) if readErr != nil { t.Errorf("Failed to read back written data: %v", readErr) } @@ -670,6 +631,8 @@ func TestSignatureWriteTo(t *testing.T) { } // TestSignatureRoundTrip verifies round-trip serialization +// +//nolint:gocognit // table-driven round-trip cases func TestSignatureRoundTrip(t *testing.T) { tests := []struct { name string @@ -744,13 +707,13 @@ func TestSignatureRoundTrip(t *testing.T) { // Write var buf bytes.Buffer - if _, err := tt.sig.WriteTo(&buf); err != nil { + if _, err := tt.sig.writeTo(&buf); err != nil { t.Fatalf("WriteTo() error = %v", err) } // Read var sig Signature - if _, err := sig.ReadFrom(&buf); err != nil { + if _, err := sig.readFrom(&buf); err != nil { t.Fatalf("ReadFrom() error = %v", err) } @@ -778,7 +741,7 @@ func TestSignatureRoundTrip(t *testing.T) { } // Validate - if err := sig.Validate(); err != nil { + if err := sig.validate(); err != nil { t.Errorf("Round-trip signature validation failed: %v", err) } }) @@ -786,6 +749,8 @@ func TestSignatureRoundTrip(t *testing.T) { } // TestSignatureWriteToErrorPaths verifies WriteTo error handling +// +//nolint:gocognit // table-driven error paths func TestSignatureWriteToErrorPaths(t *testing.T) { tests := []struct { name string @@ -885,7 +850,7 @@ func TestSignatureWriteToErrorPaths(t *testing.T) { SignatureSize: 64, SignatureData: make([]byte, 64), CommentLength: 10, - SignatureComment: "test comment", + SignatureComment: sigTestComment, }, testhelpers.NewFailingWriter(17), // Allow header (18 bytes) but fail during comment write true, @@ -898,7 +863,7 @@ func TestSignatureWriteToErrorPaths(t *testing.T) { SignatureSize: 64, SignatureData: make([]byte, 64), CommentLength: 10, - SignatureComment: "test comment", + SignatureComment: sigTestComment, }, testhelpers.NewIncompleteWriter(20), true, @@ -935,7 +900,7 @@ func TestSignatureWriteToErrorPaths(t *testing.T) { SignatureSize: 64, SignatureData: make([]byte, 64), CommentLength: 10, - SignatureComment: "test comment", + SignatureComment: sigTestComment, }, testhelpers.NewFailingWriter(28), // Allow header (18) + comment (10) but fail during data write true, @@ -948,7 +913,7 @@ func TestSignatureWriteToErrorPaths(t *testing.T) { SignatureSize: 64, SignatureData: make([]byte, 64), CommentLength: 10, - SignatureComment: "test comment", + SignatureComment: sigTestComment, }, testhelpers.NewIncompleteWriter(40), // Allow header (18) + comment (10) + partial data (12) true, @@ -985,7 +950,7 @@ func TestSignatureWriteToErrorPaths(t *testing.T) { SignatureSize: 64, SignatureData: make([]byte, 64), CommentLength: 10, - SignatureComment: "test comment", + SignatureComment: sigTestComment, }, testhelpers.NewIncompleteWriter(27), // Allow header (18) + 9 bytes of comment (need 10) true, @@ -1072,7 +1037,7 @@ func TestSignatureWriteToErrorPaths(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { // WriteTo updates lengths first - _, err := tt.sig.WriteTo(tt.writer) + _, err := tt.sig.writeTo(tt.writer) if (err != nil) != tt.wantErr { t.Errorf("WriteTo() error = %v, wantErr %v", err, tt.wantErr) From cc1ae1147d38e5e38e8b429cc7c63de3369e1f1d Mon Sep 17 00:00:00 2001 From: Andre Date: Tue, 3 Feb 2026 03:39:19 -0500 Subject: [PATCH 6/7] test(bdd): dedupe feature and tighten formatting Remove duplicate package state transition feature block and normalize PathMetadataPatch literal formatting. --- .../basic_ops/package_state_transitions.feature | 14 -------------- features/file_mgmt/path_metadata_patch.feature | 5 +---- 2 files changed, 1 insertion(+), 18 deletions(-) diff --git a/features/basic_ops/package_state_transitions.feature b/features/basic_ops/package_state_transitions.feature index 3f05a3b6..8d2abe2f 100644 --- a/features/basic_ops/package_state_transitions.feature +++ b/features/basic_ops/package_state_transitions.feature @@ -10,17 +10,3 @@ Feature: Package state transitions And invalid transitions are prevented And transitions preserve internal invariants and resource lifecycle rules And transitions align with state validation and state-dependent operations - -@domain:basic_ops @m2 @REQ-API_BASIC-199 @spec(api_basic_operations.md#3332-state-transitions) -Feature: Package state transitions - - @REQ-API_BASIC-199 @happy - Scenario: Package moves between lifecycle states via defined transitions - Given a package lifecycle - When the package is created opened and closed - Then transitions between states are defined - And transitions occur only through valid operations - And invalid transitions are prevented - And transitions preserve internal invariants and resource lifecycle rules - And transitions align with state validation and state-dependent operations - diff --git a/features/file_mgmt/path_metadata_patch.feature b/features/file_mgmt/path_metadata_patch.feature index 7a20c210..4d827fae 100644 --- a/features/file_mgmt/path_metadata_patch.feature +++ b/features/file_mgmt/path_metadata_patch.feature @@ -28,10 +28,7 @@ Feature: PathMetadataPatch in AddFileOptions @REQ-FILEMGMT-412 @happy Scenario: PathMetadataPatch sets DestPath and DestPathWin Given an open NovusPack package - When AddFile is called with PathMetadataPatch{ - DestPath: Option[string]{Value: "/unix/path"}, - DestPathWin: Option[string]{Value: "C:\\win\\path"} - } + When AddFile is called with PathMetadataPatch{DestPath: Option[string]{Value: "/unix/path"}, DestPathWin: Option[string]{Value: "C:\\win\\path"}} Then PathMetadataEntry.DestPath is set to "/unix/path" And PathMetadataEntry.DestPathWin is set to "C:\\win\\path" From d615ca58d78f9b903a7976b72b60a248104f0738 Mon Sep 17 00:00:00 2001 From: Andre Date: Tue, 3 Feb 2026 03:48:25 -0500 Subject: [PATCH 7/7] fix(ci,scripts): align python-lint deps and fix F821 in go_markdown - CI: install lint tooling from scripts/requirements-lint.txt to match local - go_markdown/_base.py: add from __future__ import annotations for Signature forward ref --- .github/workflows/python-lint.yml | 2 +- scripts/lib/go_markdown/_base.py | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/python-lint.yml b/.github/workflows/python-lint.yml index 37a5e76a..13ed190c 100644 --- a/.github/workflows/python-lint.yml +++ b/.github/workflows/python-lint.yml @@ -28,7 +28,7 @@ jobs: python-version: '3.11' - name: Install Python lint tooling - run: pip install flake8 pylint radon xenon vulture bandit + run: pip install -r scripts/requirements-lint.txt - name: Run python linting run: make lint-python diff --git a/scripts/lib/go_markdown/_base.py b/scripts/lib/go_markdown/_base.py index aa656b30..aa405e14 100644 --- a/scripts/lib/go_markdown/_base.py +++ b/scripts/lib/go_markdown/_base.py @@ -8,6 +8,7 @@ - Normalizing Go signatures and type names - Detecting example code (single lines and entire code blocks) """ +from __future__ import annotations import re from dataclasses import dataclass