Skip to content

Multi-language refactor: sync-versions.py + install-hooks.sh should support Python, Rust, C/C++, Node.js, Go #4

@djdarcy

Description

@djdarcy

Multi-language refactor: sync-versions.py + install-hooks.sh should support Python, Rust, C/C++, Node.js, Go (epic)

Problem

sync-versions.py and install-hooks.sh currently hardcode the Python-project shape:

  • A single source of truth at <package>/_version.py containing MAJOR, MINOR, PATCH, PHASE constants
  • Configuration loaded from pyproject.toml [tool.repokit-common]
  • A CHANGELOG.md with PEP-440-style compare links ([0.2.3]: <repo>/compare/v0.2.2...v0.2.3)
  • Tag format vX.Y.Z or vX.Y.Za1 (PEP 440)

This shape doesn't fit:

  • Rust — version lives in Cargo.toml [package] version = "..."; no Python package, may have no pyproject.toml
  • C / C++ — version may live in .rc resource files (Windows), a top-level VERSION file, or a custom version.h header. .rc and .lrf files are typically Windows-1252 encoded, which makes naive UTF-8 round-trips destructive (the © byte 0xA9 turns into the UTF-8 replacement character 0xEF 0xBF 0xBD)
  • Node.js — version lives in package.json "version" field
  • Go — version is typically embedded by -ldflags at build time; if a file holds it, it's a .go constant, not a Python module

Concrete failure: a downstream consumer (DazzleTools/Dazzle-Locate64, a C++ project — see Dazzle-Locate64#8) cannot use sync-versions.py at all today. Running it errors:

FileNotFoundError: Cannot find $PACKAGE_NAME/_version.py

…because there is no Python package. The maintainer hand-bumped 9 .rc / .lrf / .cpp files via PowerShell using a codepage-1252-safe replacement loop, working around the script entirely. Every release will need this manual hand-bump until sync-versions.py learns the C++ project shape.

The pre-commit hook installed by install-hooks.sh would error on every commit in such a project (calls sync-versions.py --auto).

This problem repeats for any future Rust / Node.js / Go / mixed consumer. The longer the script stays Python-only, the more downstream consumers will either fork the script locally (defeating the point of a shared subtree) or skip versioning automation entirely.

Why it matters

git-repokit-common is consumed as a git subtree by multiple downstream projects. Today's sync-versions.py is fine for Python projects; for everyone else it's dead weight. Without a multi-language refactor:

  • C++ / Rust / etc. consumers must hand-edit every release (error-prone, easy to forget files)
  • Pre-commit hooks must be commented out or omitted
  • Each consumer reinvents its own version-bump logic
  • install-hooks.sh becomes language-specific by accident

Proposed solution

Introduce a language-flavor backend system in sync-versions.py. Existing Python consumers keep working with zero config changes; non-Python consumers declare their flavor in a small manifest and sync-versions.py dispatches to the right backend.

The high-level shape:

sync-versions.py
├── core: project root discovery, --check / --bump / --set / --dry-run / --auto
├── language detection: read .repokit.json (new) or fall back to current Python heuristics
└── backends:
    ├── python  (existing _version.py + pyproject.toml flow — unchanged for current users)
    ├── rust    (Cargo.toml [package] version)
    ├── cpp     (top-level VERSION file + .rc / .lrf / .cpp string-literal propagation)
    ├── nodejs  (package.json "version")
    └── go      (version.go const or `-ldflags` informational)

Same CLI surface for every language:

python scripts/sync-versions.py --check
python scripts/sync-versions.py --bump patch
python scripts/sync-versions.py --bump minor
python scripts/sync-versions.py --bump major
python scripts/sync-versions.py --bump build       # NEW — useful for C++ build-counter scheme
python scripts/sync-versions.py --set 1.2.3
python scripts/sync-versions.py --auto             # for pre-commit hook

Language flavor declaration

Today, Python consumers use pyproject.toml [tool.repokit-common]. Other languages may have no pyproject.toml at all (and adding one purely for tooling is awkward).

Proposal: introduce an optional top-level .repokit.json (or .repokit-common.toml) manifest:

{
  "language": "cpp",
  "version-source": "VERSION",
  "version-targets": [
    {"path": "vendor/locate32-trunk/Locate/Locate32/commonres.rc", "encoding": "cp1252",
     "patterns": ["FILEVERSION ${MAJOR},${MINOR},${BUILD_MAJOR},${BUILD_MINOR}",
                  "PRODUCTVERSION ${MAJOR},${MINOR},${BUILD_MAJOR},${BUILD_MINOR}",
                  "\"${MAJOR}.${MINOR} ${PHASE} build ${BUILD_MAJOR}.${BUILD_MINOR}\""]},
    {"path": "vendor/locate32-trunk/Locate/locate/Locate.cpp", "encoding": "utf-8",
     "patterns": ["szVersionStr=\"locate ${MAJOR}.${MINOR} ${PHASE} build ${BUILD_MAJOR}.${BUILD_MINOR_CLI}\""]}
  ],
  "tag-format": "v${MAJOR}.${MINOR}-${PHASE}-build${BUILD_MAJOR}.${BUILD_MINOR}",
  "changelog": {
    "path": "CHANGELOG.md",
    "compare-link-format": "human"
  }
}

For Python consumers, .repokit.json is optional — if absent, the existing pyproject.toml [tool.repokit-common] lookup still applies. This is the backward-compat guarantee.

Language matrix

Language Source of truth (default) Encoding hazards Tag default
Python <package>/_version.py (MAJOR/MINOR/PATCH/PHASE) UTF-8, low risk PEP 440 (v0.2.3a1)
Rust Cargo.toml [package] version = "X.Y.Z" UTF-8, low risk SemVer (v0.2.3)
C / C++ Top-level VERSION file (key=value) Windows-1252 for .rc / .lrf propagation targets — must NEVER round-trip through UTF-8 Configurable; consumers may want vX.Y-RCN-buildA.B
Node.js package.json "version" UTF-8, JSON-encoded SemVer
Go version.go const Version = "X.Y.Z" (optional) UTF-8, low risk SemVer

Implementation approach

Phase 1 — extract the language abstraction

  • Refactor sync-versions.py to delegate to a Backend interface:
    class Backend:
        def read_components(self) -> dict: ...
        def write_components(self, components: dict, dry_run: bool) -> bool: ...
        def get_propagation_targets(self) -> list[Target]: ...
        def write_target(self, target: Target, version_string: str, dry_run: bool) -> bool: ...
        def format_tag(self, components: dict) -> str: ...
        def format_human_version(self, components: dict) -> str: ...
  • Existing Python logic moves into PythonBackend with no behavior change
  • Add a detect_backend() function that reads .repokit.json first, falls back to legacy pyproject.toml [tool.repokit-common] Python detection
  • Existing --check, --bump, --set, --dry-run, --auto flags work for any backend

Phase 2 — codepage-aware target writer

  • New Target class carries an optional encoding field (defaulting to UTF-8)
  • Write path always reads bytes, decodes with declared encoding, performs string-replace, encodes with same encoding, writes bytes — never opens files in text mode
  • Add a regression test that round-trips 0xA9 © through codepage-1252 without corruption

Phase 3 — Rust backend

  • Reads Cargo.toml via tomli / tomllib
  • Updates [package] version field
  • Optionally syncs to Cargo.lock (requires cargo update -p <pkg> invocation — leave as a follow-on)

Phase 4 — C / C++ backend

  • Reads VERSION (key=value file) as the source of truth
  • Iterates version-targets in .repokit.json, applying string substitution with codepage awareness
  • Supports the multi-track build-number scheme used by Locate32 (separate counters for CLI vs GUI vs language pack)

Phase 5 — Node.js backend

  • Reads / writes package.json "version" field via JSON

Phase 6 — Go backend

  • Reads / writes version.go const Version (regex-based)
  • Documents the -ldflags alternative for users who don't want a version file

Phase 7 — install-hooks.sh language detection

  • Hook script detects language via the same .repokit.json lookup
  • For Python: existing pre-commit hook calls sync-versions.py --auto
  • For other languages: same call (the dispatcher handles it transparently)
  • Pre-commit hook never errors on a project where _version.py is absent (the script itself decides what to do)

Phase 8 — Documentation

  • New docs/LANGUAGES.md documenting each backend, the manifest format, and how to add a new language
  • Update README.md to mention multi-language support
  • Migration guide for existing Python consumers (TL;DR: do nothing; or optionally adopt .repokit.json for clarity)

Design considerations

  • Codepage 1252 is non-negotiable for .rc/.lrf files. A file that contains © (0xA9) WILL be corrupted by naive UTF-8 round-trip. The script must read bytes, decode in the declared encoding, write bytes back. Add this as a regression test.
  • Backward compatibility is paramount. Every existing Python consumer must continue to work without any config change. This means: .repokit.json is optional, the legacy pyproject.toml path stays the default for Python, every existing CLI flag behaves identically.
  • Tag format flexibility. Python uses PEP 440; Rust / Node.js / Go use SemVer; C++ projects (especially forks of older codebases) may use bespoke schemes like v3.2-RC4-build11.8221. Make the tag format a string template parameterized by version components.
  • Component naming. Python uses MAJOR/MINOR/PATCH/PHASE/PRE_RELEASE_NUM. C++ may use MAJOR/MINOR/BUILD_MAJOR/BUILD_MINOR/PHASE. The component set must be language-flexible — backends declare their own component schema.
  • Atomic writes. When propagating across many files, partial-write failures are dangerous. Stage to a temp dir, fsync, rename. Or use a single transaction-style approach. Don't half-update a project's files.
  • Encoding detection vs declaration. Detection (chardet) is unreliable. Make the manifest authoritative — if a target is cp1252, declare it; the script trusts the manifest.
  • Out of scope for this issue: auto-detecting the language from filesystem heuristics (e.g., "if Cargo.toml exists, assume Rust"). Explicit declaration via manifest is more reliable. Heuristic detection can be a follow-on.

Acceptance criteria

Python (regression — must not break)

  • sync-versions.py --check and --bump <part> work identically on existing Python projects with no manifest change
  • pyproject.toml [tool.repokit-common] continues to be honored
  • Pre-commit hook continues to call sync-versions.py --auto and stage the version file

Manifest

  • .repokit.json schema documented in docs/LANGUAGES.md
  • Schema validation with helpful error messages for malformed manifests
  • When both .repokit.json and pyproject.toml [tool.repokit-common] are present, .repokit.json wins (with a warning)

Rust backend

  • --bump patch updates Cargo.toml [package] version correctly
  • Tag default is v<X>.<Y>.<Z>
  • CHANGELOG compare links work for SemVer tags

C / C++ backend

  • Reading a top-level VERSION file as source of truth
  • Propagating to multiple .rc / .lrf files without corrupting the codepage-1252 © byte (verified by SHA256 / byte-count of 0xA9 before and after)
  • Propagating to .cpp source-string literals (e.g., szVersionStr="...")
  • Tag format configurable via manifest (e.g., v${MAJOR}.${MINOR}-${PHASE}-build${BUILD_MAJOR}.${BUILD_MINOR})
  • Multi-track build counters supported (CLI counter + GUI counter)
  • Acceptance test: git subtree pull of this script into DazzleTools/Dazzle-Locate64 lets python scripts/sync-versions.py --bump build reproduce the manual hand-bump that was performed in Dazzle-Locate64@45eec7f byte-for-byte

Node.js backend

  • --bump patch updates package.json "version" correctly
  • Tag default is v<X>.<Y>.<Z>

Go backend

  • --bump patch updates version.go const Version correctly (regex-based)
  • Documentation explains the -ldflags alternative

install-hooks.sh

  • Detects language from .repokit.json (or legacy Python path)
  • Pre-commit hook does NOT error on projects without _version.py
  • Same hook works across all supported languages

Documentation

  • docs/LANGUAGES.md exists with one section per supported language
  • Migration guide for existing Python consumers (likely "do nothing")
  • CONTRIBUTING note on how to add a new language backend

Regression tests

  • Codepage-1252 round-trip preserves 0xA9 byte (testable on a fixture file)
  • Each backend has at least one round-trip test (read → bump → check → assert)

Related issues

Analysis

The codepage-1252 PowerShell pattern that Dazzle-Locate64 used as a manual workaround is the canonical reference for what "correct codepage-aware propagation" looks like:

$enc = [System.Text.Encoding]::GetEncoding(1252)
$bytes = [System.IO.File]::ReadAllBytes($file)
$text = $enc.GetString($bytes)
$text2 = $text.Replace($oldString, $newString)
[System.IO.File]::WriteAllBytes($file, $enc.GetBytes($text2))

Python equivalent:

with open(path, "rb") as f:
    raw = f.read()
text = raw.decode("cp1252")
text = text.replace(old, new)
with open(path, "wb") as f:
    f.write(text.encode("cp1252"))

The C++ backend should use this pattern verbatim for any target whose declared encoding is cp1252 (or any non-UTF-8 codepage).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestepicMulti-phase initiative spanning multiple sub-issues

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions