feat: add NormalizedRow API, ResidualErrorModel and NCA#187
feat: add NormalizedRow API, ResidualErrorModel and NCA#187
Conversation
This commit adds two related features: ## NormalizedRow API (parser/) - New struct for format-agnostic data parsing - Decouples column mapping from event creation logic - Full ADDL/II expansion support (both positive and negative directions) - Refactors pmetrics.rs to use NormalizedRow internally - Enables external tools (like vial) to reuse parsing logic without reimplementing ADDL expansion ## ResidualErrorModel (data/) - New for parametric algorithms (SAEM, FOCE) - Uses prediction-based sigma (vs observation-based in ErrorModel) - Adds and functions - Documentation clarifying ErrorModel vs ResidualErrorModel usage Both features are independent but included together to avoid merge conflicts.
e418fad to
da044a9
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds two independent features to support different algorithmic approaches in pharmacometric modeling:
Purpose: Enable external tools to reuse pharmsol's data parsing logic and provide prediction-based error models for parametric estimation algorithms.
Key Changes:
- Introduces
NormalizedRowAPI for format-agnostic data parsing with full ADDL/II expansion (positive and negative directions) - Adds
ResidualErrorModelfor parametric algorithms (SAEM, FOCE) that uses prediction-based sigma calculations - Refactors pmetrics.rs to use NormalizedRow internally
- Adds
log_likelihood_batchandlog_likelihood_subjecthelper functions for batch likelihood computation
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/data/residual_error.rs | New module implementing ResidualErrorModel and ResidualErrorModels for parametric algorithms with prediction-based sigma computation |
| src/data/parser/normalized.rs | New module providing NormalizedRow/NormalizedRowBuilder for format-agnostic data parsing with ADDL expansion |
| src/data/parser/pmetrics.rs | Refactored to use NormalizedRow internally, removing duplicate ADDL logic |
| src/simulator/likelihood/mod.rs | Adds log_likelihood_batch and log_likelihood_subject functions; updates documentation to clarify ErrorModel vs ResidualErrorModel usage |
| src/simulator/equation/mod.rs | Documentation update clarifying when to use ResidualErrorModels |
| src/simulator/equation/sde/mod.rs | Removes redundant comment about underflow |
| src/optimize/effect.rs | Adds comprehensive documentation for get_e2 function |
| src/lib.rs | Exports new types through prelude |
| src/data/parser/mod.rs | Adds normalized module |
| src/data/mod.rs | Adds residual_error module and documentation |
| src/data/error_model.rs | Adds utility methods is_proportional/is_additive and clarifies observation-based sigma usage |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// # Example | ||
| /// ```ignore | ||
| /// use pharmsol::{log_likelihood_batch, ResidualErrorModel, ResidualErrorModels}; | ||
| /// | ||
| /// let residual_error = ResidualErrorModels::new() | ||
| /// .add(0, ResidualErrorModel::constant(0.5)); | ||
| /// | ||
| /// let log_liks = log_likelihood_batch( | ||
| /// &equation, | ||
| /// &data, | ||
| /// ¶meters, | ||
| /// &residual_error, | ||
| /// )?; | ||
| /// ``` |
There was a problem hiding this comment.
The documentation example cannot compile because log_likelihood_batch and log_likelihood_subject are not re-exported at the crate root. They are only available in the prelude under pharmsol::prelude::simulator. The import should be either: use pharmsol::prelude::simulator::{log_likelihood_batch, log_likelihood_subject}; or the functions should be added to the crate root re-exports. Additionally, ResidualErrorModel and ResidualErrorModels would need to be imported from pharmsol::prelude::data or directly from pharmsol (which works via the pub use crate::data::*; re-export).
|
|
||
| // Handle ADDL/II expansion | ||
| if let (Some(addl), Some(ii)) = (self.addl, self.ii) { | ||
| if addl != 0 && ii > 0.0 { |
There was a problem hiding this comment.
The ADDL expansion logic requires ii > 0.0, meaning negative or zero interdose intervals are silently ignored. This could lead to unexpected behavior where users provide negative II values and no additional doses are generated. Consider either: (1) documenting this requirement explicitly in the function documentation, or (2) returning an error when ADDL is non-zero but II is not positive, to make the failure explicit rather than silent.
* nca * wip: current version * feat: nca * clenup * chore: documentation * chore: cleanup * chore: cleanup * chore: deprecating ErrorModel in favor of AssayErrorModel, subdividing the likelihood module and deprecating linear space likelihood calculation functions * feat: the Data parsing is centraliced to NormalizedRow * feat: the ErrorModel -> AssayErrorModel * feat: validation * chore: cleanup * chore: cleanup
tests/nca/test_terminal.rs
Outdated
| 9.07, // 100 * e^(-0.1*24) | ||
| ]; | ||
|
|
||
| let result = calculate_lambda_z_adjusted_r2(×, &concs, None); |
There was a problem hiding this comment.
That is the old API for that function, the weird thing is that tests are passing. It seems those might not be wired up. I'll look into that
There was a problem hiding this comment.
I just pushed a new commit with the fix
| /// | ||
| /// This type alias is provided for backward compatibility. | ||
| /// New code should use [`AssayErrorModel`] directly. | ||
| #[deprecated( |
There was a problem hiding this comment.
Why not just remove it, since we are going for a new minor verison anyway?
There was a problem hiding this comment.
I ended up deprecating all the old functions to make clear what I am removing from the API to make it easy to find / discuss.
I am aiming to a major patch, the API is changing a lot, there are other deprecated functions
Co-authored-by: Markus Hovd <markushh@uio.no>
src/data/parser/normalized.rs
Outdated
| //! | ||
| //! This module provides a format-agnostic intermediate representation that decouples | ||
| //! column naming/mapping from event creation logic. Any data source (CSV with custom | ||
| //! columns, Excel, DataFrames) can construct [`NormalizedRow`] instances, then use | ||
| //! [`NormalizedRow::into_events()`] to get properly parsed pharmsol Events. | ||
| //! | ||
| //! # Design Philosophy | ||
| //! | ||
| //! The key insight is separating two concerns: | ||
| //! 1. **Row Normalization** - Transform arbitrary input formats into a standard representation | ||
| //! 2. **Event Creation** - Convert normalized rows into pharmsol Events (with ADDL expansion, etc.) | ||
| //! | ||
| //! This allows any consumer (GUI applications, scripts, other tools) to bring their own | ||
| //! "column mapping" while reusing parsing logic. |
There was a problem hiding this comment.
| //! | |
| //! This module provides a format-agnostic intermediate representation that decouples | |
| //! column naming/mapping from event creation logic. Any data source (CSV with custom | |
| //! columns, Excel, DataFrames) can construct [`NormalizedRow`] instances, then use | |
| //! [`NormalizedRow::into_events()`] to get properly parsed pharmsol Events. | |
| //! | |
| //! # Design Philosophy | |
| //! | |
| //! The key insight is separating two concerns: | |
| //! 1. **Row Normalization** - Transform arbitrary input formats into a standard representation | |
| //! 2. **Event Creation** - Convert normalized rows into pharmsol Events (with ADDL expansion, etc.) | |
| //! | |
| //! This allows any consumer (GUI applications, scripts, other tools) to bring their own | |
| //! "column mapping" while reusing parsing logic. |
src/data/row.rs
Outdated
| Event::Observation(obs) => { | ||
| assert_eq!(obs.time(), 1.0); | ||
| assert_eq!(obs.value(), Some(25.5)); | ||
| assert_eq!(obs.outeq(), 0); // Converted to 0-indexed |
There was a problem hiding this comment.
I am not sure if we want to convert this to 0-indexed if we are parsing a "NormalizedRow". I think that is something that is special to Pmetrics, because R is 1-indexed. So in this case, it should be 1.
There was a problem hiding this comment.
I think we should stop removing one to the index values all around the codebase. If pmetrics wants to use compartment 1 we can do that by allocating two elements in the vector. keeping track if it is 0-index or 1-index has been very painful lately for me
There was a problem hiding this comment.
I love what I am reading! Yes please :) :) :) :) :)
… require extra size
* chore: Rename modules and structures * Update src/error/mod.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/data/parser/mod.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Name changes --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
| Branch | feat/normalized-row-api |
| Testbed | mhovd-pgx |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result nanoseconds (ns) (Result Δ%) | Upper Boundary nanoseconds (ns) (Limit %) |
|---|---|---|---|
| Analytical vs ODE/One-compartment IV/Analytical | 📈 view plot 🚷 view threshold | 841.20 ns(-3.57%)Baseline: 872.36 ns | 1,863.72 ns (45.14%) |
| Analytical vs ODE/One-compartment IV/ODE | 📈 view plot 🚷 view threshold | 17,506.00 ns(-0.35%)Baseline: 17,568.00 ns | 19,540.87 ns (89.59%) |
| Analytical vs ODE/One-compartment oral/Analytical | 📈 view plot 🚷 view threshold | 852.81 ns(-3.17%)Baseline: 880.71 ns | 1,768.50 ns (48.22%) |
| Analytical vs ODE/One-compartment oral/ODE | 📈 view plot 🚷 view threshold | 25,592.00 ns(-0.12%)Baseline: 25,624.00 ns | 26,642.26 ns (96.06%) |
| Analytical vs ODE/Two-compartment IV/Analytical | 📈 view plot 🚷 view threshold | 923.13 ns(-3.15%)Baseline: 953.12 ns | 1,907.25 ns (48.40%) |
| Analytical vs ODE/Two-compartment IV/ODE | 📈 view plot 🚷 view threshold | 26,338.00 ns(-0.01%)Baseline: 26,341.00 ns | 26,436.46 ns (99.63%) |
| Analytical vs ODE/Two-compartment oral/Analytical | 📈 view plot 🚷 view threshold | 939.50 ns(-5.48%)Baseline: 994.00 ns | 2,728.22 ns (34.44%) |
| Analytical vs ODE/Two-compartment oral/ODE | 📈 view plot 🚷 view threshold | 29,216.00 ns(+0.26%)Baseline: 29,139.00 ns | 31,589.18 ns (92.49%) |
| Conditional dose modification | 📈 view plot 🚷 view threshold | 1,203.60 ns(+0.25%)Baseline: 1,200.60 ns | 1,296.06 ns (92.87%) |
| Create large dataset (100 subjects) | 📈 view plot 🚷 view threshold | 53,644.00 ns(-0.32%)Baseline: 53,818.50 ns | 59,371.18 ns (90.35%) |
| Data expand complex (1h intervals) | 📈 view plot 🚷 view threshold | 27,761.00 ns(+1.37%)Baseline: 27,385.00 ns | 39,349.51 ns (70.55%) |
| Data expand simple (1h intervals) | 📈 view plot 🚷 view threshold | 492.53 ns(+1.15%)Baseline: 486.94 ns | 664.82 ns (74.09%) |
| Data expand with additional time | 📈 view plot 🚷 view threshold | 38,271.00 ns(+0.72%)Baseline: 37,997.00 ns | 46,715.82 ns (81.92%) |
| Filter exclude subjects | 📈 view plot 🚷 view threshold | 30,938.00 ns(-0.24%)Baseline: 31,013.00 ns | 33,399.54 ns (92.63%) |
| Filter include subjects | 📈 view plot 🚷 view threshold | 7,861.40 ns(-1.48%)Baseline: 7,979.10 ns | 11,724.37 ns (67.05%) |
| Modify all bolus doses | 📈 view plot 🚷 view threshold | 1,169.40 ns(+0.32%)Baseline: 1,165.65 ns | 1,284.98 ns (91.01%) |
| Modify all infusion doses | 📈 view plot 🚷 view threshold | 1,202.30 ns(-0.17%)Baseline: 1,204.35 ns | 1,269.58 ns (94.70%) |
| SubjectBuilder multi-occasion | 📈 view plot 🚷 view threshold | 261.43 ns(-0.06%)Baseline: 261.60 ns | 267.01 ns (97.91%) |
| SubjectBuilder simple | 📈 view plot 🚷 view threshold | 102.78 ns(-0.33%)Baseline: 103.12 ns | 114.10 ns (90.08%) |
| SubjectBuilder with covariates | 📈 view plot 🚷 view threshold | 265.83 ns(-0.99%)Baseline: 268.50 ns | 353.46 ns (75.21%) |
| one_compartment | 📈 view plot 🚷 view threshold | 19,484.00 ns(-0.11%)Baseline: 19,506.00 ns | 20,206.05 ns (96.43%) |
| one_compartment_covariates | 📈 view plot 🚷 view threshold | 25,541.00 ns(-0.45%)Baseline: 25,657.50 ns | 29,364.59 ns (86.98%) |
| readme 20 | 📈 view plot 🚷 view threshold | 304,150.00 ns(+0.86%)Baseline: 301,560.00 ns | 383,975.14 ns (79.21%) |
| two_compartment | 📈 view plot 🚷 view threshold | 21,586.00 ns(-0.24%)Baseline: 21,638.50 ns | 23,309.08 ns (92.61%) |
This commit adds two related features:
NormalizedRow API (parser/)
ResidualErrorModel (data/)
Both features are independent but included together to avoid merge conflicts.