refactor: Tier 1 & 2 code quality improvements (v3.3.0 baseline)#4
Merged
refactor: Tier 1 & 2 code quality improvements (v3.3.0 baseline)#4
Conversation
- Add detail::compute_weighted_stats / compute_weighted_mean helpers (include/libhmm/math/weighted_stats.h + src/common/weighted_stats.cpp); replace duplicated weighted fit preambles in beta, uniform, chi_squared, exponential, and poisson distributions - Normalise tool includes: replace direct gaussian_distribution.h with distributions.h umbrella in all six tools - FileIOManager: extract do_write() helper; eliminate duplicated write/append scaffolding in writeTextFile and writeLines - NumericalDiagnostics: extract classify_value() and append_base_recommendations() helpers; use std::any_of in generateHealthReport; add Doxygen comments - Add detail::parse_named_param to distribution_io_utils.h for future operator>> cleanup - Remove incomplete fromString() from ChiSquaredDistribution and StudentTDistribution (inconsistent with library IO conventions; unused) - ChiSquaredDistribution and StudentTDistribution: replace redundant explicit copy/move special members with = default (DistributionBase handles atomic) - Trainer: add protected apply_emission_fits() helper; replace duplicated M-step emission fit loops in BaumWelchTrainer and ViterbiTrainer Co-Authored-By: Oz <oz-agent@warp.dev>
- Calculator base: add precompute_log_transitions() protected static helper; reduce both precomputeLogTransitions() member functions to one-liners - ForwardBackwardCalculator: extract init_log_forward() and init_log_backward() anonymous-namespace helpers; eliminate duplicated t=0 setup in pairwise and MaxReduce forward/backward variants - GammaDistribution + WeibullDistribution: extract apply_fit_params(mean,var) private method in each; eliminate duplicated parameter-assignment and validation tail shared between unweighted and weighted fit() overloads; also = default redundant explicit copy/move special members (same fix as ChiSquared/StudentT in Tier 1) - ViterbiTrainer: extract accum_sequence() (try/catch + stats accumulation) and normalize_and_commit() (pi/trans normalization) private helpers; runIteration() CC drops from 17 to 5 - BaumWelchTrainer: extract accumulate_gamma(), accumulate_xi(), m_step_pi(), and m_step_transitions() private helpers; train() CC drops from 30 to ~11 Co-Authored-By: Oz <oz-agent@warp.dev>
All concrete distributions had explicit Rule-of-Five copy/move implementations identical to what the compiler generates. DistributionBase fully handles the std::atomic<bool> cache flag in its own Rule of Five, so = default is correct for all derived classes regardless of whether their own members are scalars, std::array, or std::vector. Distributions updated: Gaussian, Exponential, Beta, LogNormal, Pareto, Uniform, Poisson, NegativeBinomial, Rayleigh, Binomial, Discrete. Combined with the Tier 1/2 fixes (ChiSquared, StudentT, Gamma, Weibull), all 15 concrete distributions now use = default for copy/move. Co-Authored-By: Oz <oz-agent@warp.dev>
/utf-8 declares source and execution character sets as UTF-8. Eliminates C4566 warnings on Windows systems with a non-UTF-8 ANSI code page (e.g. CP1252) when source files contain non-ASCII Unicode characters such as Greek letters in examples. Applied via add_compile_options - covers all targets safely. Co-Authored-By: Oz <oz-agent@warp.dev>
Formatting was not enforced locally before this session. All modified C++ headers and sources now conform to the project .clang-format rules. Co-Authored-By: Oz <oz-agent@warp.dev>
normalize_and_commit calls hmm.setPi() and hmm.setTrans() which are not noexcept. Marking the wrapper noexcept was incorrect and flagged by cppcheck (throwInNoexceptFunction). The original runIteration() code was not noexcept; the annotation was added in error during extraction. Co-Authored-By: Oz <oz-agent@warp.dev>
The bash script hook failed on Windows because /bin/bash is not available in the standard PATH. Replaced with an equivalent Python script (scripts/check-pragma-once.py) and switched the pre-commit config to language: python, which works on all platforms. The bash script is retained for reference but is no longer invoked. Co-Authored-By: Oz <oz-agent@warp.dev>
OldCrow
added a commit
that referenced
this pull request
May 5, 2026
CMakeLists.txt: VERSION 3.3.0 -> 3.4.0 include/libhmm/libhmm.h: @Version 3.4.0 CHANGELOG.md: add [3.4.0] entry covering all three refactor phases: - Phase 1 (PR #4): code quality - weighted stats helper, duplicate preamble elimination, BaumWelch CC 32->13, common.h split - Phase 2 (PR #5): common.h fan-in 24 reduction - Phase 3 (PR #6): JSON serializer, IO alignment bug fix, operator>> decomposition, sanitization, sample files README.md: - Version badge: 3.1.2 -> 3.4.0 - Tests badge: 36/36 -> 38/38 - Add I/O section (JSON save_json/load_json, legacy XML) - Add samples/ to project structure - Add io/ to include/libhmm/ directory listing - Update tools/ and tests/ descriptions Co-Authored-By: Oz <oz-agent@warp.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Code quality refactoring based on a full metrics pass (lizard, cloc, PMD CPD, include fan-in/fan-out) against the v3.3.0 baseline. Implements Tier 1 and Tier 2 of the refactoring roadmap, plus opportunistic cleanup discovered during the work.
Metrics baseline → result
BaumWelchTrainer::train)Changes
Tier 1 — Low-risk duplication removal
detail::compute_weighted_stats/compute_weighted_mean): eliminates duplicated weighted preambles across 5 distributions (beta, uniform, chi_squared, exponential, poisson)distributions.humbrella header (removing stale direct Gaussian includes)FileIOManagerwrite-path deduplication: extracteddo_write()anonymous-namespace helperNumericalDiagnosticshelpers:classify_value(),append_base_recommendations(),report_has_issues()— plus Doxygen commentsfromString()fromChiSquaredDistributionandStudentTDistribution(inconsistent with library conventions, unused); addeddetail::parse_named_paramutility for futureoperator>>cleanupTrainer::apply_emission_fits()protected static helper shared byBaumWelchTrainerandViterbiTrainerTier 2 — Structural refactors
Calculator::precompute_log_transitions(): protected static helper eliminates identical 13-line implementations inForwardBackwardCalculatorandViterbiCalculatorinit_log_forward()/init_log_backward()in anonymous namespace, eliminating duplicated t=0 setup across pairwise and MaxReduce variantsGammaDistribution::apply_fit_params()/WeibullDistribution::apply_fit_params(): private helpers eliminating duplicated parameter-assignment tails between unweighted and weightedfit()overloadsViterbiTrainer::runIterationdecomposition:accum_sequence()+normalize_and_commit()helpers; CC drops from 17 to 5BaumWelchTrainer::traindecomposition:accumulate_gamma(),accumulate_xi(),m_step_pi(),m_step_transitions(); CC drops from 32 to 13Distribution Rule-of-Five normalisation
All 15 concrete distributions had explicit copy/move special members that were identical to what the compiler generates.
DistributionBasefully handlesstd::atomic<bool>in its own Rule of Five, so= defaultis correct throughout. Fixed in two passes (4 distributions in Tier 1/2 work, remaining 11 in a dedicated cleanup commit). Net: −488 lines of boilerplate.Build fix
/utf-8for MSVC viaadd_compile_optionsto suppress C4566 Unicode codepage warnings on CP1252 systemsTesting
37/37 tests passing on Windows (MSVC, AVX2). All changes are pure refactoring — no behavioural modifications.
Roadmap
This PR covers Phase 1. The refactoring plan continues with:
refactor/code-quality-phase2): Tier 3common/common.hcoupling split + Tier 4hmm.cpp::operator>>decomposition — to be done on M1/TahoePlan: https://app.warp.dev/drive/notebook/niAbtAHXXGBa2zwkv2pe6R
Session: https://app.warp.dev/conversation/ca60ff01-fa5e-45cf-be1a-5dd234ddd87c
Co-Authored-By: Oz oz-agent@warp.dev