Multi-language refactor: sync-versions.py + install-hooks.sh should support Python, Rust, C/C++, Node.js, Go (epic)
Problem
sync-versions.py and install-hooks.sh currently hardcode the Python-project shape:
- A single source of truth at
<package>/_version.py containing MAJOR, MINOR, PATCH, PHASE constants
- Configuration loaded from
pyproject.toml [tool.repokit-common]
- A
CHANGELOG.md with PEP-440-style compare links ([0.2.3]: <repo>/compare/v0.2.2...v0.2.3)
- Tag format
vX.Y.Z or vX.Y.Za1 (PEP 440)
This shape doesn't fit:
- Rust — version lives in
Cargo.toml [package] version = "..."; no Python package, may have no pyproject.toml
- C / C++ — version may live in
.rc resource files (Windows), a top-level VERSION file, or a custom version.h header. .rc and .lrf files are typically Windows-1252 encoded, which makes naive UTF-8 round-trips destructive (the © byte 0xA9 turns into the UTF-8 replacement character 0xEF 0xBF 0xBD)
- Node.js — version lives in
package.json "version" field
- Go — version is typically embedded by
-ldflags at build time; if a file holds it, it's a .go constant, not a Python module
Concrete failure: a downstream consumer (DazzleTools/Dazzle-Locate64, a C++ project — see Dazzle-Locate64#8) cannot use sync-versions.py at all today. Running it errors:
FileNotFoundError: Cannot find $PACKAGE_NAME/_version.py
…because there is no Python package. The maintainer hand-bumped 9 .rc / .lrf / .cpp files via PowerShell using a codepage-1252-safe replacement loop, working around the script entirely. Every release will need this manual hand-bump until sync-versions.py learns the C++ project shape.
The pre-commit hook installed by install-hooks.sh would error on every commit in such a project (calls sync-versions.py --auto).
This problem repeats for any future Rust / Node.js / Go / mixed consumer. The longer the script stays Python-only, the more downstream consumers will either fork the script locally (defeating the point of a shared subtree) or skip versioning automation entirely.
Why it matters
git-repokit-common is consumed as a git subtree by multiple downstream projects. Today's sync-versions.py is fine for Python projects; for everyone else it's dead weight. Without a multi-language refactor:
- C++ / Rust / etc. consumers must hand-edit every release (error-prone, easy to forget files)
- Pre-commit hooks must be commented out or omitted
- Each consumer reinvents its own version-bump logic
install-hooks.sh becomes language-specific by accident
Proposed solution
Introduce a language-flavor backend system in sync-versions.py. Existing Python consumers keep working with zero config changes; non-Python consumers declare their flavor in a small manifest and sync-versions.py dispatches to the right backend.
The high-level shape:
sync-versions.py
├── core: project root discovery, --check / --bump / --set / --dry-run / --auto
├── language detection: read .repokit.json (new) or fall back to current Python heuristics
└── backends:
├── python (existing _version.py + pyproject.toml flow — unchanged for current users)
├── rust (Cargo.toml [package] version)
├── cpp (top-level VERSION file + .rc / .lrf / .cpp string-literal propagation)
├── nodejs (package.json "version")
└── go (version.go const or `-ldflags` informational)
Same CLI surface for every language:
python scripts/sync-versions.py --check
python scripts/sync-versions.py --bump patch
python scripts/sync-versions.py --bump minor
python scripts/sync-versions.py --bump major
python scripts/sync-versions.py --bump build # NEW — useful for C++ build-counter scheme
python scripts/sync-versions.py --set 1.2.3
python scripts/sync-versions.py --auto # for pre-commit hook
Language flavor declaration
Today, Python consumers use pyproject.toml [tool.repokit-common]. Other languages may have no pyproject.toml at all (and adding one purely for tooling is awkward).
Proposal: introduce an optional top-level .repokit.json (or .repokit-common.toml) manifest:
{
"language": "cpp",
"version-source": "VERSION",
"version-targets": [
{"path": "vendor/locate32-trunk/Locate/Locate32/commonres.rc", "encoding": "cp1252",
"patterns": ["FILEVERSION ${MAJOR},${MINOR},${BUILD_MAJOR},${BUILD_MINOR}",
"PRODUCTVERSION ${MAJOR},${MINOR},${BUILD_MAJOR},${BUILD_MINOR}",
"\"${MAJOR}.${MINOR} ${PHASE} build ${BUILD_MAJOR}.${BUILD_MINOR}\""]},
{"path": "vendor/locate32-trunk/Locate/locate/Locate.cpp", "encoding": "utf-8",
"patterns": ["szVersionStr=\"locate ${MAJOR}.${MINOR} ${PHASE} build ${BUILD_MAJOR}.${BUILD_MINOR_CLI}\""]}
],
"tag-format": "v${MAJOR}.${MINOR}-${PHASE}-build${BUILD_MAJOR}.${BUILD_MINOR}",
"changelog": {
"path": "CHANGELOG.md",
"compare-link-format": "human"
}
}
For Python consumers, .repokit.json is optional — if absent, the existing pyproject.toml [tool.repokit-common] lookup still applies. This is the backward-compat guarantee.
Language matrix
| Language |
Source of truth (default) |
Encoding hazards |
Tag default |
| Python |
<package>/_version.py (MAJOR/MINOR/PATCH/PHASE) |
UTF-8, low risk |
PEP 440 (v0.2.3a1) |
| Rust |
Cargo.toml [package] version = "X.Y.Z" |
UTF-8, low risk |
SemVer (v0.2.3) |
| C / C++ |
Top-level VERSION file (key=value) |
Windows-1252 for .rc / .lrf propagation targets — must NEVER round-trip through UTF-8 |
Configurable; consumers may want vX.Y-RCN-buildA.B |
| Node.js |
package.json "version" |
UTF-8, JSON-encoded |
SemVer |
| Go |
version.go const Version = "X.Y.Z" (optional) |
UTF-8, low risk |
SemVer |
Implementation approach
Phase 1 — extract the language abstraction
- Refactor
sync-versions.py to delegate to a Backend interface:
class Backend:
def read_components(self) -> dict: ...
def write_components(self, components: dict, dry_run: bool) -> bool: ...
def get_propagation_targets(self) -> list[Target]: ...
def write_target(self, target: Target, version_string: str, dry_run: bool) -> bool: ...
def format_tag(self, components: dict) -> str: ...
def format_human_version(self, components: dict) -> str: ...
- Existing Python logic moves into
PythonBackend with no behavior change
- Add a
detect_backend() function that reads .repokit.json first, falls back to legacy pyproject.toml [tool.repokit-common] Python detection
- Existing
--check, --bump, --set, --dry-run, --auto flags work for any backend
Phase 2 — codepage-aware target writer
- New
Target class carries an optional encoding field (defaulting to UTF-8)
- Write path always reads bytes, decodes with declared encoding, performs string-replace, encodes with same encoding, writes bytes — never opens files in text mode
- Add a regression test that round-trips
0xA9 © through codepage-1252 without corruption
Phase 3 — Rust backend
- Reads
Cargo.toml via tomli / tomllib
- Updates
[package] version field
- Optionally syncs to
Cargo.lock (requires cargo update -p <pkg> invocation — leave as a follow-on)
Phase 4 — C / C++ backend
- Reads
VERSION (key=value file) as the source of truth
- Iterates
version-targets in .repokit.json, applying string substitution with codepage awareness
- Supports the multi-track build-number scheme used by Locate32 (separate counters for CLI vs GUI vs language pack)
Phase 5 — Node.js backend
- Reads / writes
package.json "version" field via JSON
Phase 6 — Go backend
- Reads / writes
version.go const Version (regex-based)
- Documents the
-ldflags alternative for users who don't want a version file
Phase 7 — install-hooks.sh language detection
- Hook script detects language via the same
.repokit.json lookup
- For Python: existing pre-commit hook calls
sync-versions.py --auto
- For other languages: same call (the dispatcher handles it transparently)
- Pre-commit hook never errors on a project where
_version.py is absent (the script itself decides what to do)
Phase 8 — Documentation
- New
docs/LANGUAGES.md documenting each backend, the manifest format, and how to add a new language
- Update
README.md to mention multi-language support
- Migration guide for existing Python consumers (TL;DR: do nothing; or optionally adopt
.repokit.json for clarity)
Design considerations
- Codepage 1252 is non-negotiable for
.rc/.lrf files. A file that contains © (0xA9) WILL be corrupted by naive UTF-8 round-trip. The script must read bytes, decode in the declared encoding, write bytes back. Add this as a regression test.
- Backward compatibility is paramount. Every existing Python consumer must continue to work without any config change. This means:
.repokit.json is optional, the legacy pyproject.toml path stays the default for Python, every existing CLI flag behaves identically.
- Tag format flexibility. Python uses PEP 440; Rust / Node.js / Go use SemVer; C++ projects (especially forks of older codebases) may use bespoke schemes like
v3.2-RC4-build11.8221. Make the tag format a string template parameterized by version components.
- Component naming. Python uses MAJOR/MINOR/PATCH/PHASE/PRE_RELEASE_NUM. C++ may use MAJOR/MINOR/BUILD_MAJOR/BUILD_MINOR/PHASE. The component set must be language-flexible — backends declare their own component schema.
- Atomic writes. When propagating across many files, partial-write failures are dangerous. Stage to a temp dir, fsync, rename. Or use a single transaction-style approach. Don't half-update a project's files.
- Encoding detection vs declaration. Detection (chardet) is unreliable. Make the manifest authoritative — if a target is
cp1252, declare it; the script trusts the manifest.
- Out of scope for this issue: auto-detecting the language from filesystem heuristics (e.g., "if
Cargo.toml exists, assume Rust"). Explicit declaration via manifest is more reliable. Heuristic detection can be a follow-on.
Acceptance criteria
Python (regression — must not break)
Manifest
Rust backend
C / C++ backend
Node.js backend
Go backend
install-hooks.sh
Documentation
Regression tests
Related issues
Analysis
The codepage-1252 PowerShell pattern that Dazzle-Locate64 used as a manual workaround is the canonical reference for what "correct codepage-aware propagation" looks like:
$enc = [System.Text.Encoding]::GetEncoding(1252)
$bytes = [System.IO.File]::ReadAllBytes($file)
$text = $enc.GetString($bytes)
$text2 = $text.Replace($oldString, $newString)
[System.IO.File]::WriteAllBytes($file, $enc.GetBytes($text2))
Python equivalent:
with open(path, "rb") as f:
raw = f.read()
text = raw.decode("cp1252")
text = text.replace(old, new)
with open(path, "wb") as f:
f.write(text.encode("cp1252"))
The C++ backend should use this pattern verbatim for any target whose declared encoding is cp1252 (or any non-UTF-8 codepage).
Multi-language refactor: sync-versions.py + install-hooks.sh should support Python, Rust, C/C++, Node.js, Go (epic)
Problem
sync-versions.pyandinstall-hooks.shcurrently hardcode the Python-project shape:<package>/_version.pycontainingMAJOR,MINOR,PATCH,PHASEconstantspyproject.toml[tool.repokit-common]CHANGELOG.mdwith PEP-440-style compare links ([0.2.3]: <repo>/compare/v0.2.2...v0.2.3)vX.Y.ZorvX.Y.Za1(PEP 440)This shape doesn't fit:
Cargo.toml[package] version = "..."; no Python package, may have nopyproject.toml.rcresource files (Windows), a top-levelVERSIONfile, or a customversion.hheader..rcand.lrffiles are typically Windows-1252 encoded, which makes naive UTF-8 round-trips destructive (the©byte 0xA9 turns into the UTF-8 replacement character0xEF 0xBF 0xBD)package.json"version"field-ldflagsat build time; if a file holds it, it's a.goconstant, not a Python moduleConcrete failure: a downstream consumer (
DazzleTools/Dazzle-Locate64, a C++ project — see Dazzle-Locate64#8) cannot usesync-versions.pyat all today. Running it errors:…because there is no Python package. The maintainer hand-bumped 9
.rc/.lrf/.cppfiles via PowerShell using a codepage-1252-safe replacement loop, working around the script entirely. Every release will need this manual hand-bump untilsync-versions.pylearns the C++ project shape.The pre-commit hook installed by
install-hooks.shwould error on every commit in such a project (callssync-versions.py --auto).This problem repeats for any future Rust / Node.js / Go / mixed consumer. The longer the script stays Python-only, the more downstream consumers will either fork the script locally (defeating the point of a shared subtree) or skip versioning automation entirely.
Why it matters
git-repokit-commonis consumed as a git subtree by multiple downstream projects. Today'ssync-versions.pyis fine for Python projects; for everyone else it's dead weight. Without a multi-language refactor:install-hooks.shbecomes language-specific by accidentProposed solution
Introduce a language-flavor backend system in
sync-versions.py. Existing Python consumers keep working with zero config changes; non-Python consumers declare their flavor in a small manifest andsync-versions.pydispatches to the right backend.The high-level shape:
Same CLI surface for every language:
Language flavor declaration
Today, Python consumers use
pyproject.toml[tool.repokit-common]. Other languages may have nopyproject.tomlat all (and adding one purely for tooling is awkward).Proposal: introduce an optional top-level
.repokit.json(or.repokit-common.toml) manifest:{ "language": "cpp", "version-source": "VERSION", "version-targets": [ {"path": "vendor/locate32-trunk/Locate/Locate32/commonres.rc", "encoding": "cp1252", "patterns": ["FILEVERSION ${MAJOR},${MINOR},${BUILD_MAJOR},${BUILD_MINOR}", "PRODUCTVERSION ${MAJOR},${MINOR},${BUILD_MAJOR},${BUILD_MINOR}", "\"${MAJOR}.${MINOR} ${PHASE} build ${BUILD_MAJOR}.${BUILD_MINOR}\""]}, {"path": "vendor/locate32-trunk/Locate/locate/Locate.cpp", "encoding": "utf-8", "patterns": ["szVersionStr=\"locate ${MAJOR}.${MINOR} ${PHASE} build ${BUILD_MAJOR}.${BUILD_MINOR_CLI}\""]} ], "tag-format": "v${MAJOR}.${MINOR}-${PHASE}-build${BUILD_MAJOR}.${BUILD_MINOR}", "changelog": { "path": "CHANGELOG.md", "compare-link-format": "human" } }For Python consumers,
.repokit.jsonis optional — if absent, the existingpyproject.toml[tool.repokit-common]lookup still applies. This is the backward-compat guarantee.Language matrix
<package>/_version.py(MAJOR/MINOR/PATCH/PHASE)v0.2.3a1)Cargo.toml[package] version = "X.Y.Z"v0.2.3)VERSIONfile (key=value).rc/.lrfpropagation targets — must NEVER round-trip through UTF-8vX.Y-RCN-buildA.Bpackage.json"version"version.goconst Version = "X.Y.Z"(optional)Implementation approach
Phase 1 — extract the language abstraction
sync-versions.pyto delegate to aBackendinterface:PythonBackendwith no behavior changedetect_backend()function that reads.repokit.jsonfirst, falls back to legacypyproject.toml [tool.repokit-common]Python detection--check,--bump,--set,--dry-run,--autoflags work for any backendPhase 2 — codepage-aware target writer
Targetclass carries an optionalencodingfield (defaulting to UTF-8)0xA9 ©through codepage-1252 without corruptionPhase 3 — Rust backend
Cargo.tomlviatomli/tomllib[package] versionfieldCargo.lock(requirescargo update -p <pkg>invocation — leave as a follow-on)Phase 4 — C / C++ backend
VERSION(key=value file) as the source of truthversion-targetsin.repokit.json, applying string substitution with codepage awarenessPhase 5 — Node.js backend
package.json"version"field via JSONPhase 6 — Go backend
version.goconst Version(regex-based)-ldflagsalternative for users who don't want a version filePhase 7 —
install-hooks.shlanguage detection.repokit.jsonlookupsync-versions.py --auto_version.pyis absent (the script itself decides what to do)Phase 8 — Documentation
docs/LANGUAGES.mddocumenting each backend, the manifest format, and how to add a new languageREADME.mdto mention multi-language support.repokit.jsonfor clarity)Design considerations
.rc/.lrffiles. A file that contains©(0xA9) WILL be corrupted by naive UTF-8 round-trip. The script must read bytes, decode in the declared encoding, write bytes back. Add this as a regression test..repokit.jsonis optional, the legacypyproject.tomlpath stays the default for Python, every existing CLI flag behaves identically.v3.2-RC4-build11.8221. Make the tag format a string template parameterized by version components.cp1252, declare it; the script trusts the manifest.Cargo.tomlexists, assume Rust"). Explicit declaration via manifest is more reliable. Heuristic detection can be a follow-on.Acceptance criteria
Python (regression — must not break)
sync-versions.py --checkand--bump <part>work identically on existing Python projects with no manifest changepyproject.toml [tool.repokit-common]continues to be honoredsync-versions.py --autoand stage the version fileManifest
.repokit.jsonschema documented indocs/LANGUAGES.md.repokit.jsonandpyproject.toml [tool.repokit-common]are present,.repokit.jsonwins (with a warning)Rust backend
--bump patchupdatesCargo.toml [package] versioncorrectlyv<X>.<Y>.<Z>C / C++ backend
VERSIONfile as source of truth.rc/.lrffiles without corrupting the codepage-1252©byte (verified by SHA256 / byte-count of0xA9before and after).cppsource-string literals (e.g.,szVersionStr="...")v${MAJOR}.${MINOR}-${PHASE}-build${BUILD_MAJOR}.${BUILD_MINOR})git subtree pullof this script intoDazzleTools/Dazzle-Locate64letspython scripts/sync-versions.py --bump buildreproduce the manual hand-bump that was performed inDazzle-Locate64@45eec7fbyte-for-byteNode.js backend
--bump patchupdatespackage.json "version"correctlyv<X>.<Y>.<Z>Go backend
--bump patchupdatesversion.go const Versioncorrectly (regex-based)-ldflagsalternativeinstall-hooks.sh.repokit.json(or legacy Python path)_version.pyDocumentation
docs/LANGUAGES.mdexists with one section per supported languageRegression tests
0xA9byte (testable on a fixture file)Related issues
45eec7f)git subtree pullto pick up new backendsAnalysis
The codepage-1252 PowerShell pattern that Dazzle-Locate64 used as a manual workaround is the canonical reference for what "correct codepage-aware propagation" looks like:
Python equivalent:
The C++ backend should use this pattern verbatim for any target whose declared encoding is
cp1252(or any non-UTF-8 codepage).