Skip to content

feat: modernize codebase for Python 3.9+ with pydantic validation#148

Open
tcoroller wants to merge 19 commits intomainfrom
139-modernize-codebase
Open

feat: modernize codebase for Python 3.9+ with pydantic validation#148
tcoroller wants to merge 19 commits intomainfrom
139-modernize-codebase

Conversation

@tcoroller
Copy link
Copy Markdown
Member

  • Drop Python 3.8 support (EOL Oct 2024), require Python >=3.9
  • Add Python 3.12 classifier
  • Add pydantic>=2.0 as required dependency
  • Bump torch requirement to >=2.8
  • Replace validate_data.py with new pydantic-based validators.py
  • Add tools/init.py module
  • Refactor all metrics and loss modules to use new validators
  • Add hypothesis-based property tests in tests/strategies.py
  • Add shared test fixtures in tests/conftest.py
  • Add tests/test_validators.py for new validation logic
  • Update dev/environment.yml and run-unittests.sh
  • Update ruff target-version to py39
  • Add docs/package_overview.md
  • Add .hypothesis/ to .gitignore

- Drop Python 3.8 support (EOL Oct 2024), require Python >=3.9
- Add Python 3.12 classifier
- Add pydantic>=2.0 as required dependency
- Bump torch requirement to >=2.8
- Replace validate_data.py with new pydantic-based validators.py
- Add tools/__init__.py module
- Refactor all metrics and loss modules to use new validators
- Add hypothesis-based property tests in tests/strategies.py
- Add shared test fixtures in tests/conftest.py
- Add tests/test_validators.py for new validation logic
- Update dev/environment.yml and run-unittests.sh
- Update ruff target-version to py39
- Add docs/package_overview.md
- Add .hypothesis/ to .gitignore

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tcoroller tcoroller linked an issue Mar 17, 2026 that may be closed by this pull request
5 tasks
@tcoroller tcoroller closed this Mar 17, 2026
@tcoroller tcoroller reopened this Mar 17, 2026
tcoroller and others added 7 commits March 17, 2026 16:11
These packages are required by the modernized codebase but were missing
from dev/environment.yml, causing ModuleNotFoundError in CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the optional 'checks' / 'check' boolean parameter from all
loss functions, metric classes, and stats functions. Validation via
pydantic validators is now always performed (guarded only by the
existing torch.jit.is_scripting/is_tracing check for JIT compatibility).

Affected:
- loss/cox.py: neg_partial_log_likelihood, baseline_survival_function
- loss/weibull.py: neg_log_likelihood_weibull
- loss/survival.py: neg_log_likelihood, survival_function
- metrics/auc.py: Auc.__init__
- metrics/brier_score.py: BrierScore.__init__
- metrics/cindex.py: ConcordanceIndex.__init__
- stats/ipcw.py: get_ipcw
- stats/kaplan_meier.py: KaplanMeierEstimator.__call__

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The tests directory is not a package (no __init__.py), so pytest adds
it to sys.path directly. Use 'from strategies import' consistent with
how all other test files import from 'utils'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ions

Replace O(K*N) Python loops with fully vectorized tensor operations for
the standard (non-time-varying) Cox path:

- H/R list comprehensions → scatter_add (events per time) + reverse
  cumsum (suffix sum of exp(log_hz) for risk sets), reducing cost from
  O(K*N) loop-based torch.where calls to O(N) tensor ops
- Nested Efron for-loop (k, r) → padded (K, max_ties) tensor: compute
  all Efron denominator terms in parallel, mask padding with log(1)=0
- Breslow log_denominator stack → log(reverse_cumsum[first_idx])
- _cumulative_baseline_hazard: same reverse-cumsum trick

The time-varying extended Cox path retains the existing list-comprehension
approach (irregular 2-D indexing prevents general vectorization).
Strata loops in the outer functions are kept as-is (structurally required;
typically <=5 strata).

Numerical equivalence verified against the reference implementation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. test_brier_score_simulated_data (flaky): sksurv requires eval times
   strictly within [test_min, test_max) — filter new_time to exclude
   test_time.max() before passing to sksurv; skip IBS comparison when
   time grids differ to avoid integrating over mismatched intervals.

2. test_cox_equivalence (TorchScript): torch.jit.script rejects PEP 604
   'X | Y' union syntax — replace 'torch.Tensor | None' with
   Optional[torch.Tensor] in all three function signatures; add
   'from typing import Optional'. Suppress ruff UP045 since Optional
   is required for TorchScript compatibility.

3. test_training (MPS device): torch.searchsorted is not supported on
   Apple MPS — add _searchsorted() helper that runs on CPU and moves
   the result back to the original device; use it everywhere in cox.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each public class and function now has runnable Examples sections covering:
- happy-path usage with exact tensor output
- auto-coercion behaviour (e.g. int event → bool, model_type normalisation)
- error cases showing the ValidationError / ValueError message

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tcoroller tcoroller self-assigned this Mar 17, 2026
@tcoroller tcoroller added enhancement ✨ New feature or request labels Mar 17, 2026
@tcoroller tcoroller modified the milestone: 0.2.0 Mar 17, 2026
tcoroller and others added 2 commits March 17, 2026 19:54
…rict, ruff clean

- validators.py: Pydantic v2 models for all public inputs; TimeVaryingCoxInputs,
  impute_missing_log_shape in ModelInputs, helpers, base class; log_params/log_hz
  allow inf (legitimate model outputs)
- cox.py: Optional → X | None; TimeVaryingCoxInputs at call sites; fix
  baseline_survival return type; typed pll_tensor variable
- momentum.py: module-level _SurvivalTuple; Callable[..., Any]; deque[Any];
  type: ignore[misc] on torch decorator/subclass lines
- metrics/auc.py, cindex.py, brier_score.py: __init__ return None, instance attrs
  typed as X | None, assert guards in private methods, full method signatures
- stats/kaplan_meier.py: device: str | None, self.device: Any, return types, Iterator fix
- __init__.py files: __all__ + re-exports for root, loss, metrics, stats, tools
- conftest.py: collections.abc.Callable
- tests: ruff-clean (B017 specific exceptions, E712 bool idioms, UP006/UP035, F401/F841)
- pyproject.toml: follow_imports=skip for third-party; pandas-stubs; ruff test exclusion removed
- dev/codeqc.sh: now runs ruff format + ruff check + mypy
- dev/environment.yml: pandas-stubs via pip

Quality gates: mypy strict 0 errors, ruff clean, 87 tests pass, 85% coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tcoroller tcoroller marked this pull request as draft March 18, 2026 01:32
@tcoroller tcoroller marked this pull request as ready for review March 18, 2026 12:27
tcoroller and others added 5 commits March 18, 2026 14:16
* Migrate development tooling from conda to uv

- Make pyproject.toml the single source of truth for all dependencies
- Add missing test deps (lifelines, scikit-survival, lightning, loguru,
  torchvision, scikit-learn, pandas) to dev dependency group
- Add missing doc deps (nbsphinx, sphinxcontrib-bibtex, sphinx-math-dollar,
  ipython, matplotlib, pandas, lightning) to docs optional extra
- Add publish dependency group (build, twine)
- Migrate all CI workflows to use astral-sh/setup-uv + uv sync
- Update codeqc.yml, docs.yml, build-release.yml, torch-compatibility.yml
- Update CLAUDE.md, CONTRIBUTING.md, README.md, docs/devnotes.md
- Deprecate dev/environment.yml with notice pointing to uv
- Regenerate uv.lock with all new dependency groups

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix codeqc CI: run dev script via uv run

The shell script calls ruff/mypy which live in .venv and need
uv run to be on PATH.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix mypy type-ignore comments and pre-commit hook

- Update type: ignore from [misc] to [untyped-decorator] to match
  actual mypy error codes from newer mypy + torch versions
- Update pre-commit mypy hook: use 'uv run --no-sync mypy' with
  explicit 'src/' target. --no-sync prevents uv from rebuilding
  the editable install mid-hook (which confuses pre-commit).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* uv issue

* Add missing sphinx-book-theme to docs dependencies

The docs conf.py uses sphinx_book_theme but it was only in the
legacy environment.yml, not in pyproject.toml docs extras.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add pandoc to CI workflows that build docs

pandoc is a system dependency required by nbsphinx to convert
notebooks. Previously installed implicitly via conda; now needs
explicit apt-get install.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tcoroller
Copy link
Copy Markdown
Member Author

PR Review Summary: Modernize Codebase + Migrate to uv

This PR has two major themes: (1) codebase modernization (Python 3.9+, Pydantic v2, typing, mypy strict) and (2) migration from conda to uv as the development toolchain. Here's a guided tour for reviewers.


🔴 High Attention — Review Carefully

1. Pydantic v2 Validators (src/torchsurv/tools/validators.py — NEW file, replaces validate_data.py)

This is the biggest behavioral change. The old validate_data.py (deleted) used manual checks; the new validators.py uses Pydantic v2 models for input validation throughout the library.

  • What to check: Validation logic, error messages, edge cases (empty tensors, NaNs, shape mismatches)
  • Impact: Every loss function and metric now runs inputs through Pydantic validators
  • Tests: tests/test_validators.py (new)

2. Cox Loss Vectorization (src/torchsurv/loss/cox.py)

The Cox partial log-likelihood computation was vectorized — replacing Python loops with tensor operations. This is a correctness-critical performance change.

  • What to check: Efron/Breslow tie handling, strata support, numerical equivalence with previous implementation
  • Tests: tests/test_cox.py (heavily modified — compare against lifelines/scikit-survival baselines)

3. Source Code — Type Annotations & __all__ (all files under src/torchsurv/)

Every module got strict mypy typing and explicit __all__ exports. All __init__.py files now have public API definitions.

  • What to check: Return types, Optional vs Union usage, that __all__ exports match the intended public API
  • Quick scan OK for: Most changes are additive annotations on existing signatures

🟡 Medium Attention — Verify Correctness

4. CI Workflows (.github/workflows/ — all 4 files)

All workflows migrated from conda → uv:

  • codeqc.yml: setup-minicondasetup-uv + uv sync + uv run

  • docs.yml: Same pattern, adds apt-get install pandoc (system dep that conda handled implicitly)

  • build-release.yml: pip install build twineuv build + uvx twine check

  • torch-compatibility.yml: pip → uv sync --override for PyTorch version matrix

  • What to check: That each workflow installs the right dependency groups (--group dev, --extra docs). The torch-compatibility override pattern (echo 'torch==X.Y.Z' > /tmp/torch-override.txt) is new.

5. pyproject.toml

Major changes:

  • requires-python raised from >=3.8 to >=3.9

  • pydantic>=2.0 added as a runtime dependency

  • [dependency-groups] dev expanded with test deps previously only in environment.yml (lifelines, scikit-survival, lightning, loguru, torchvision, etc.)

  • [project.optional-dependencies] docs expanded (sphinx-book-theme, nbsphinx, sphinxcontrib-bibtex, etc.)

  • New publish dependency group (build, twine)

  • What to check: That the dependency list is complete and version bounds are sensible. Pydantic as a runtime dep is a user-facing change.

6. Tests (all files under tests/)

  • New shared fixtures in conftest.py and strategies.py (Hypothesis strategies)

  • Test files updated for new validator API and type annotations

  • test_hypothesis.py heavily reworked

  • What to check: That test coverage is maintained. Spot-check a few test files against previous behavior.


🟢 Low Attention — Quick Scan

7. Documentation Updates

  • docs/devnotes.md: uv is now the primary dev setup path; conda moved to collapsed legacy section. All command examples updated.
  • CONTRIBUTING.md: Already was uv-first, removed conda fallback line.
  • README.md: Dev setup section in <details> updated to use uv sync. End-user install (pip/conda) unchanged.
  • CLAUDE.md: Internal dev guidelines, commands updated to uv run.
  • New docs: docs/benchmarks.md, docs/package_overview.md, docs/notebooks/non_medical_applications.ipynb

8. .pre-commit-config.yaml

  • mypy hook changed from entry: mypy to entry: uv run --no-sync mypy with explicit src/ target
  • This was needed because mypy is inside .venv, not on system PATH

9. dev/environment.yml

  • Added deprecation notice header pointing to pyproject.toml + uv sync
  • File kept for backward compatibility but no longer maintained

10. Housekeeping

  • .gitignore additions
  • CODEOWNERS.md added
  • communications/ folder added
  • dev/*.sh scripts: minor cleanups (set -euo pipefail, PYTHONPATH fixes)
  • uv.lock: New lockfile (auto-generated, ~9k lines — no need to read)

Key Decisions to Validate

Decision Rationale
Pydantic v2 as runtime dependency Adds ~2MB to install; enables structured validation
Drop Python 3.8 support 3.8 reached EOL Oct 2024
uv as sole dev toolchain Eliminates dep duplication between pyproject.toml and environment.yml
Keep environment.yml (deprecated) Soft transition for existing conda users
torch>=2.8 minimum Required for newer torch features used in vectorized cox

"\n",
"def make_loaders(X, event, time, test_size=0.2, val_size=0.1, batch_size=64):\n",
" \"\"\"Split into train/val/test and return DataLoaders.\"\"\"\n",
" X_tv, X_test, e_tv, e_test, t_tv, t_test = train_test_split(X, event, time, test_size=test_size, random_state=42)\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "tv" stand for?

" auc_metric = Auc()\n",
" auc_val = auc_metric(estimate=log_hz, event=event_test, time=time_test, new_time=median_t)\n",
"\n",
" # Brier Score\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what not add the brier score?

"\n",
" # C-index (use negative log_scale as risk score — higher scale = longer survival)\n",
" ci = ConcordanceIndex()\n",
" c = ci(-log_scale, event_test, time_test)\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i disagree with this. we should pass the parameters to get the survival function and use this as risk score. like in the main notebook. happy to do it just let me know if there was a particular other reason to use - log_scale

"name": "stdout",
"output_type": "stream",
"text": [
"Failure rate: 94.0% | Median observed hours: 118\n"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the failure rate seems very high and the median observed hours super low (less than 5 days of use). I propose to increase the baseline time, like i do in the commit

"source": [
"# ── Synthetic Extreme Weather Data ───────────────────────────────────────────\n",
"rng = np.random.default_rng(4)\n",
"N = 600\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why N = 600 and not 800 like all other examples?

"Each section:\n",
"- Generates **synthetic data** with realistic distributions (fixed seeds for reproducibility)\n",
"- Points to a **real public dataset** you can use as a drop-in replacement\n",
"- Trains both a **Cox** or **Weibull AFT** model using a simple MLP backbone\n",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not true , you only train one of them

@@ -11,87 +11,110 @@ pre-commit install

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've tried the new steps with uv and it all works

return log_params


def validate_time_varying_log_hz(time_sorted: torch.Tensor, log_hz_sorted: torch.Tensor) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can delete this function. It has been replaced by TimeVaryingCoxInputs.


# denominator_ties[k] = sum of exp(log_hz) for event subjects at time_unique[k]
denominator_ties = torch.zeros(K, device=device)
denominator_ties.scatter_add_(0, event_idx, exp_hz[event_sorted])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the code was lighter and clearer before. in the if else statements. Why the change?


### Contributors

* Yao Chen (**Novartis**)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make the changes with the acknowledgments as we discussed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement ✨ New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Modernize codebase

2 participants