Skip to content

Conversation

@danielfong-act
Copy link
Contributor

@danielfong-act danielfong-act commented Jan 21, 2026

Documentation Editing Concerns

I'd like to start here, just in case I have to close this PR to rework everything. This all took some time, in part because of my concerns over how to edit the docs. I found that merely opening a notebook in jupyter lab (invoked by uv run --with jupyter jupyter lab) immediately changed many parts of the file. While I can still build the docs locally, I'm worried that this will break things upstream. Editing the notebook using a text editor is cumbersome outside of small edits, and would have made adding cells incredibly annoying. I couldn't find a safe way to edit things otherwise, so I went ahead and used jupyter lab. Please tell me if there's a better way of editing the docs, or if I am worrying about nothing.
EDIT: I built the docs here. This seems like a non-issue.

Changes

This PR is intended to put any BarnettZehnwirth related issues or concerns in the ground. The documentation related to this estimator in user_guide/development has been changed in the following ways:

  1. The PTF formula was updated to match implementation/fix a typo
  2. The 3 direction linear (ordinal) model was removed
  3. Documentation for chainladder.utility_functions.PTF_formula was added
  4. The model from this paper (pp. 48-49) was added as an example for how to apply PTF_formula.
  5. Added the above paper to the references.
  6. Things were moved around/reworded where I saw appropriate.

In addition, I changed how gamma and iota are passed to PTF_formula, and changed test_barnzehn accordingly.

To be changed

Before merging I would like to address some things, including implementation considerations raised earlier.

  1. Do we want to integrate PTF_formula somewhere into BarnettZehnwirth/PatsyFormula/etc.?
  2. Do we want to change how the greek parameters are passed to PTF_formula? Currently, parameter groups are passed as tuples denoting boundaries. This convention is consistent and very explicit, but adds some unnecessary redundancy.
  3. Documentation rewording/restructuring. The main goal is clarity.
  4. Any other concerns not listed here.
    If any discussion runs long or major changes are suggested, we could also create a new branch and iterate from there.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 73.07692% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.02%. Comparing base (ad4d081) to head (dcd9313).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
chainladder/development/barnzehn.py 64.70% 3 Missing and 3 partials ⚠️
chainladder/utils/utility_functions.py 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #657      +/-   ##
==========================================
- Coverage   85.10%   85.02%   -0.09%     
==========================================
  Files          85       85              
  Lines        4869     4881      +12     
  Branches      619      627       +8     
==========================================
+ Hits         4144     4150       +6     
- Misses        517      520       +3     
- Partials      208      211       +3     
Flag Coverage Δ
unittests 85.02% <73.07%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@henrydingliu
Copy link
Collaborator

henrydingliu commented Jan 22, 2026

Do we want to integrate PTF_formula somewhere into BarnettZehnwirth/PatsyFormula/etc.?

I would consider building it directly into the PatsyFormula class in this package

Do we want to change how the greek parameters are passed to PTF_formula? Currently, parameter groups are passed as tuples denoting boundaries. This convention is consistent and very explicit, but adds some unnecessary redundancy.

List would be cleaner.

Documentation rewording/restructuring. The main goal is clarity.

Large changes to the doc is impossible to review at the moment outside of downloading the notebook. Can you please either create a RTD instance on your fork or enable doc versioning on the main repo?

@danielfong-act
Copy link
Contributor Author

danielfong-act commented Jan 22, 2026

I would consider building it directly into the PatsyFormula class in this package

Shouldn't be too hard, but I'll have to add some logic to handle BZ being passed a formula and Greeks. I'll probably have it throw an error if both are given, since the coefficients produced by basic Patsy categorical/ordinal formulas don't match those of PTF_formula's formulas.

List would be cleaner.

Iotas and gammas would be easy to visualize as a list of endpoints. Alphas would be less explicit this way, because they would become the first endpoint in a bucket, with the other endpoint being implied. Still, it's probably worth moving to lists so users don't have to type a bunch of redundant tuples.

create a RTD instance

oh duh, I can just do that
https://danielfong-act-chainladder-python.readthedocs.io/en/latest/user_guide/development.html
It seems like nothing broke. I'll update the docs shortly when I change those two things above.

@henrydingliu
Copy link
Collaborator

but I'll have to add some logic to handle BZ being passed a formula and Greeks

would it break anything if you just add the greeks to the formula if provided?

I'll update the docs shortly when I change those two things above

I think about the theoretical progression as the ELRF residual being beginner, PTF residual being intermediate, and the BZ08 PTF residual being advanced. So do we really want something as advanced as BZ08 PTF in the main body in the development user guide? Would it be more suitable as an example?

@danielfong-act
Copy link
Contributor Author

would it break anything if you just add the greeks to the formula if provided?

Kinda yeah. The formulation in my updated docs matches the output produced by fitting models with PTF_formula, with a minor difference from the authors' formulation. Using Patsy formulas to fit categorical variables directly leads to a fairly different formulation.

  1. The intercept is no longer equal to $\alpha_0$, since fitting C(development) directly estimates a coefficient for the first development period. Thus, $\alpha_0$ is decomposed into the intercept plus a coefficient for the first development year, $\gamma_0$.
  2. Each origin coefficient is the incremental trend added onto the intercept. In terms of the PTF_formula formulation (NOT the authors' formulation), the $i$'th origin coefficient is equal to $\sum_{k=1}^i\alpha_k$.
  3. Similarly, each development coefficient represent that year's incremental trend above the baseline. In terms of the authors' formulation, the $j$'th development trend is $\sum_{k=1}^j \gamma_k$, which does not include the first year's development trend $\gamma_0$.
  4. The first development year is year 0 in the papers and in PTF_formula. Using Patsy directly causes the first year to be whatever the development grain is. This causes ordinal valuation trends to affect the intercept (origin and valuation years are indexed from 0, and thus do not affect the intercept).

I think mixing the two would lead to coefficients that are harder to interpret. Side note, I should add something to make it clear that the formulation is different for the pathological example.

So do we really want something as advanced as BZ08 PTF in the main body in the development user guide? Would it be more suitable as an example?

Maybe, but I don't have another example that uses PTF_formula. I think the functionality provided by PTF_formula is core to the authors' intentions; without it, we're left with categorical/ordinal only models. We'll have to come up with something to take its place.

danielfong-act added 2 commits January 22, 2026 14:17
…ket. Gammas/iotas are the endpoints of each linear piece. Updated test_barnzehn accordingly
@danielfong-act
Copy link
Contributor Author

I changed PTF_formula to take period lists, not tuple lists. Docs are updating rn.
I'm reluctant to add PTF_formula functionality to PastyFormula itself, because it'd be exposing unnecessary parameters to everything else that uses PatsyFormula.
Integrating PTF_formula into BarnettZehnwirth seems fine to me. It does create some weirdness, where the formula might be defined explicitly during initialization or fitting, depending on how it's specified.

@henrydingliu
Copy link
Collaborator

because it'd be exposing unnecessary parameters to everything else that uses PatsyFormula

would it really be that much different from what's written into PTF_Formula already? consider the following pseudo code

class patsy():
 def init(formula, alpha, gamma, iota)
  self.formula = formula
  if alpha:
   self.formula += f'I(shenanigans{alpha}'
...

standardizing gamma as in [1,2,3,6,8,11] instead of [12,24,36,72,96,132]. this allows you to get rid of having to pass in the triangle.

@danielfong-act
Copy link
Contributor Author

would it really be that much different from what's written into PTF_Formula

Not really, which is why I don't see the motivation to change it. It adds something to PatsyFormula that other estimators won't use, which, I suppose, isn't gonna hurt anything. I just don't see why we want this.

standardizing gamma

Sure why not? Can be done as a preprocessing step even.

this allows you to get rid of having to pass in the triangle.

For the most part, but PTF_formula needs some way of knowing the development grain. If we move to passing in a grain-agnostic development, then we need either the triangle or another parameter to specify the grain. If we're hiding PTF_formula somewhere in BarnettZehnwirth, then the latter option is just fine by me. Development grain can be harvested from the triangle before the formula is built.
If PTF_formula stuff is in PatsyFormula, we're passing along even more parameters that are only used by one thing. It seems less clean, but I don't really have a good reason to object to it. I'll draft it up both ways and we can decide from there.

Thoughts on moving this to a branch?

danielfong-act added 4 commits January 23, 2026 12:51
@danielfong-act
Copy link
Contributor Author

Not quite done with docs, since I still need to conjure up a simpler PTF example. 0a58083 has PTF_formula in BarnettZehnwirth.fit; is 840bc4f what you had in mind for PatsyFormula, more or less?

@henrydingliu
Copy link
Collaborator

For the most part, but PTF_formula needs some way of knowing the development grain

thought about this some more. ultimately my idea of standardization doesn't actually accomplish anything. we don't have any other examples in this package of deriving an annual pattern that can then be applied to quarterly triangles. we went down this rabbit hole with dividing by dgrain because that's what the BZ method did. i'm starting to think that the BZ method should be leaving the developement vector in months. not only that, the BZ method should be leaving both origin and valuation in the actual years. otherwise the BZ method becomes very tedious to maintain from year to year, i.e. if you run a fixed-size triangle for your reserve review, you have to increment the BZ parameters every analysis.

tl; dr: I'm starting to think we should evolve from the paper, and start defining our PTF parameters in their original magnitudes. i.e. origin buckets as [1977,1979,1980,1982], dev buckets as [12,24,36,60,84,132], val buckets as [1977,1984,1985,1988]

this would require quite a bit of rework of the BZ method. i think we are all a little overstimulated on the BZ method. I'm good on approving this whenever you think the doc is done.

@danielfong-act
Copy link
Contributor Author

Making future analysis easier provides a pretty good motivation to how dates are handled/indexed. Since we have pandas under the hood anyways, it might be worth using datetimeindices and offsets. We can leave that for the future, but it's worth noting here.

i think we are all a little overstimulated on the BZ method

Heh yeah. I just wanted to leave things in a more "finished" state, and I doubt I'm alone in that. For now I will leave the authors' example where it is. The m3ir5 dataset would be a good choice for an analysis to replace it, if someone wishes to perform it.

There is one bit of weirdness, however. Somehow, the outputs of the Parsimonious Modeling and PatsyFormula section of user_guide/development changed. This PR doesn't touch any code relevant to those sections and the development book hasn't been touched in 2 years, so I think this change (bug?) is caused by an older PR.

@henrydingliu
Copy link
Collaborator

There is one bit of weirdness, however. Somehow, the outputs of the Parsimonious Modeling and PatsyFormula section of user_guide/development changed. This PR doesn't touch any code relevant to those sections and the development book hasn't been touched in 2 years, so I think this change (bug?) is caused by an older PR.

could be pandas 3.0. i just pushed a change to address. see if your RTD instance fixes itself after you sync

@danielfong-act
Copy link
Contributor Author

danielfong-act commented Jan 25, 2026

Well that's embarrassing. The PatsyFormula section fits a random forest, so things should change every time.
Numbers in Parsimonious Modeling are still a bit off but I'm not sure if it's a cause for concern. The numbers in the live docs don't match those in the notebook before or after this merge. Might just be the optimizer or something.

Copy link
Collaborator

@henrydingliu henrydingliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henrydingliu henrydingliu merged commit 238db22 into casact:master Jan 26, 2026
6 of 8 checks passed
@danielfong-act
Copy link
Contributor Author

@henrydingliu thanks, and thanks for sticking with this one so long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants