Skip to content

refactor: Tier 1 & 2 code quality improvements (v3.3.0 baseline)#4

Merged
OldCrow merged 7 commits intomainfrom
refactor/code-quality-phase1
May 4, 2026
Merged

refactor: Tier 1 & 2 code quality improvements (v3.3.0 baseline)#4
OldCrow merged 7 commits intomainfrom
refactor/code-quality-phase1

Conversation

@OldCrow
Copy link
Copy Markdown
Owner

@OldCrow OldCrow commented May 4, 2026

Summary

Code quality refactoring based on a full metrics pass (lizard, cloc, PMD CPD, include fan-in/fan-out) against the v3.3.0 baseline. Implements Tier 1 and Tier 2 of the refactoring roadmap, plus opportunistic cleanup discovered during the work.

Metrics baseline → result

Metric v3.3.0 After this PR Δ
Lizard warnings 40 34 −6
Highest actionable CC 32 (BaumWelchTrainer::train) 13 −19
Production SLOC (src + include) ~9,526 9,002 −524 (−5.5%)
CPD duplications (75-token threshold) 21 significantly reduced

Changes

Tier 1 — Low-risk duplication removal

  • Weighted statistics helper (detail::compute_weighted_stats / compute_weighted_mean): eliminates duplicated weighted preambles across 5 distributions (beta, uniform, chi_squared, exponential, poisson)
  • Tool include cleanup: 6 tool files normalised to the distributions.h umbrella header (removing stale direct Gaussian includes)
  • FileIOManager write-path deduplication: extracted do_write() anonymous-namespace helper
  • NumericalDiagnostics helpers: classify_value(), append_base_recommendations(), report_has_issues() — plus Doxygen comments
  • Distribution I/O: removed incomplete fromString() from ChiSquaredDistribution and StudentTDistribution (inconsistent with library conventions, unused); added detail::parse_named_param utility for future operator>> cleanup
  • Trainer emission-fit loop: Trainer::apply_emission_fits() protected static helper shared by BaumWelchTrainer and ViterbiTrainer

Tier 2 — Structural refactors

  • Calculator::precompute_log_transitions(): protected static helper eliminates identical 13-line implementations in ForwardBackwardCalculator and ViterbiCalculator
  • FB forward/backward init helpers: init_log_forward() / init_log_backward() in anonymous namespace, eliminating duplicated t=0 setup across pairwise and MaxReduce variants
  • GammaDistribution::apply_fit_params() / WeibullDistribution::apply_fit_params(): private helpers eliminating duplicated parameter-assignment tails between unweighted and weighted fit() overloads
  • ViterbiTrainer::runIteration decomposition: accum_sequence() + normalize_and_commit() helpers; CC drops from 17 to 5
  • BaumWelchTrainer::train decomposition: accumulate_gamma(), accumulate_xi(), m_step_pi(), m_step_transitions(); CC drops from 32 to 13

Distribution Rule-of-Five normalisation

All 15 concrete distributions had explicit copy/move special members that were identical to what the compiler generates. DistributionBase fully handles std::atomic<bool> in its own Rule of Five, so = default is correct throughout. Fixed in two passes (4 distributions in Tier 1/2 work, remaining 11 in a dedicated cleanup commit). Net: −488 lines of boilerplate.

Build fix

  • Added /utf-8 for MSVC via add_compile_options to suppress C4566 Unicode codepage warnings on CP1252 systems

Testing

37/37 tests passing on Windows (MSVC, AVX2). All changes are pure refactoring — no behavioural modifications.

Roadmap

This PR covers Phase 1. The refactoring plan continues with:

  • Phase 2 (refactor/code-quality-phase2): Tier 3 common/common.h coupling split + Tier 4 hmm.cpp::operator>> decomposition — to be done on M1/Tahoe
  • Version bump to 3.4.0 when the bulk of the plan is complete

Plan: https://app.warp.dev/drive/notebook/niAbtAHXXGBa2zwkv2pe6R
Session: https://app.warp.dev/conversation/ca60ff01-fa5e-45cf-be1a-5dd234ddd87c

Co-Authored-By: Oz oz-agent@warp.dev

OldCrow and others added 7 commits May 3, 2026 18:47
- Add detail::compute_weighted_stats / compute_weighted_mean helpers
  (include/libhmm/math/weighted_stats.h + src/common/weighted_stats.cpp);
  replace duplicated weighted fit preambles in beta, uniform, chi_squared,
  exponential, and poisson distributions

- Normalise tool includes: replace direct gaussian_distribution.h with
  distributions.h umbrella in all six tools

- FileIOManager: extract do_write() helper; eliminate duplicated write/append
  scaffolding in writeTextFile and writeLines

- NumericalDiagnostics: extract classify_value() and append_base_recommendations()
  helpers; use std::any_of in generateHealthReport; add Doxygen comments

- Add detail::parse_named_param to distribution_io_utils.h for future
  operator>> cleanup

- Remove incomplete fromString() from ChiSquaredDistribution and
  StudentTDistribution (inconsistent with library IO conventions; unused)

- ChiSquaredDistribution and StudentTDistribution: replace redundant explicit
  copy/move special members with = default (DistributionBase handles atomic)

- Trainer: add protected apply_emission_fits() helper; replace duplicated
  M-step emission fit loops in BaumWelchTrainer and ViterbiTrainer

Co-Authored-By: Oz <oz-agent@warp.dev>
- Calculator base: add precompute_log_transitions() protected static helper;
  reduce both precomputeLogTransitions() member functions to one-liners

- ForwardBackwardCalculator: extract init_log_forward() and
  init_log_backward() anonymous-namespace helpers; eliminate duplicated
  t=0 setup in pairwise and MaxReduce forward/backward variants

- GammaDistribution + WeibullDistribution: extract apply_fit_params(mean,var)
  private method in each; eliminate duplicated parameter-assignment and
  validation tail shared between unweighted and weighted fit() overloads;
  also = default redundant explicit copy/move special members (same fix
  as ChiSquared/StudentT in Tier 1)

- ViterbiTrainer: extract accum_sequence() (try/catch + stats accumulation)
  and normalize_and_commit() (pi/trans normalization) private helpers;
  runIteration() CC drops from 17 to 5

- BaumWelchTrainer: extract accumulate_gamma(), accumulate_xi(),
  m_step_pi(), and m_step_transitions() private helpers; train() CC
  drops from 30 to ~11

Co-Authored-By: Oz <oz-agent@warp.dev>
All concrete distributions had explicit Rule-of-Five copy/move
implementations identical to what the compiler generates. DistributionBase
fully handles the std::atomic<bool> cache flag in its own Rule of Five, so
= default is correct for all derived classes regardless of whether their own
members are scalars, std::array, or std::vector.

Distributions updated: Gaussian, Exponential, Beta, LogNormal, Pareto,
Uniform, Poisson, NegativeBinomial, Rayleigh, Binomial, Discrete.

Combined with the Tier 1/2 fixes (ChiSquared, StudentT, Gamma, Weibull),
all 15 concrete distributions now use = default for copy/move.

Co-Authored-By: Oz <oz-agent@warp.dev>
/utf-8 declares source and execution character sets as UTF-8.
Eliminates C4566 warnings on Windows systems with a non-UTF-8 ANSI
code page (e.g. CP1252) when source files contain non-ASCII Unicode
characters such as Greek letters in examples.

Applied via add_compile_options - covers all targets safely.

Co-Authored-By: Oz <oz-agent@warp.dev>
Formatting was not enforced locally before this session. All modified
C++ headers and sources now conform to the project .clang-format rules.

Co-Authored-By: Oz <oz-agent@warp.dev>
normalize_and_commit calls hmm.setPi() and hmm.setTrans() which are not
noexcept. Marking the wrapper noexcept was incorrect and flagged by
cppcheck (throwInNoexceptFunction). The original runIteration() code was
not noexcept; the annotation was added in error during extraction.

Co-Authored-By: Oz <oz-agent@warp.dev>
The bash script hook failed on Windows because /bin/bash is not
available in the standard PATH. Replaced with an equivalent Python
script (scripts/check-pragma-once.py) and switched the pre-commit
config to language: python, which works on all platforms.

The bash script is retained for reference but is no longer invoked.

Co-Authored-By: Oz <oz-agent@warp.dev>
@OldCrow OldCrow merged commit 045d472 into main May 4, 2026
7 checks passed
@OldCrow OldCrow deleted the refactor/code-quality-phase1 branch May 4, 2026 12:57
OldCrow added a commit that referenced this pull request May 5, 2026
CMakeLists.txt: VERSION 3.3.0 -> 3.4.0
include/libhmm/libhmm.h: @Version 3.4.0

CHANGELOG.md: add [3.4.0] entry covering all three refactor phases:
- Phase 1 (PR #4): code quality - weighted stats helper, duplicate
  preamble elimination, BaumWelch CC 32->13, common.h split
- Phase 2 (PR #5): common.h fan-in 24 reduction
- Phase 3 (PR #6): JSON serializer, IO alignment bug fix, operator>>
  decomposition, sanitization, sample files

README.md:
- Version badge: 3.1.2 -> 3.4.0
- Tests badge: 36/36 -> 38/38
- Add I/O section (JSON save_json/load_json, legacy XML)
- Add samples/ to project structure
- Add io/ to include/libhmm/ directory listing
- Update tools/ and tests/ descriptions

Co-Authored-By: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant