-
Notifications
You must be signed in to change notification settings - Fork 91
Barnett Zehnwirth documentation and formulation rework #657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…th implementation. Added note about difference in formulation. Fixed valuation (iota) summation term.
…ple from 2008 paper. Updated reference list
…. More changes to BZ documentation.
…raphs from 2008 paper for 'reasonable model'
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #657 +/- ##
==========================================
- Coverage 85.10% 85.02% -0.09%
==========================================
Files 85 85
Lines 4869 4881 +12
Branches 619 627 +8
==========================================
+ Hits 4144 4150 +6
- Misses 517 520 +3
- Partials 208 211 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I would consider building it directly into the PatsyFormula class in this package
List would be cleaner.
Large changes to the doc is impossible to review at the moment outside of downloading the notebook. Can you please either create a RTD instance on your fork or enable doc versioning on the main repo? |
Shouldn't be too hard, but I'll have to add some logic to handle BZ being passed a formula and Greeks. I'll probably have it throw an error if both are given, since the coefficients produced by basic Patsy categorical/ordinal formulas don't match those of PTF_formula's formulas.
Iotas and gammas would be easy to visualize as a list of endpoints. Alphas would be less explicit this way, because they would become the first endpoint in a bucket, with the other endpoint being implied. Still, it's probably worth moving to lists so users don't have to type a bunch of redundant tuples.
oh duh, I can just do that |
would it break anything if you just add the greeks to the formula if provided?
I think about the theoretical progression as the ELRF residual being beginner, PTF residual being intermediate, and the BZ08 PTF residual being advanced. So do we really want something as advanced as BZ08 PTF in the main body in the development user guide? Would it be more suitable as an example? |
Kinda yeah. The formulation in my updated docs matches the output produced by fitting models with PTF_formula, with a minor difference from the authors' formulation. Using Patsy formulas to fit categorical variables directly leads to a fairly different formulation.
I think mixing the two would lead to coefficients that are harder to interpret. Side note, I should add something to make it clear that the formulation is different for the pathological example.
Maybe, but I don't have another example that uses PTF_formula. I think the functionality provided by PTF_formula is core to the authors' intentions; without it, we're left with categorical/ordinal only models. We'll have to come up with something to take its place. |
…ket. Gammas/iotas are the endpoints of each linear piece. Updated test_barnzehn accordingly
|
I changed PTF_formula to take period lists, not tuple lists. Docs are updating rn. |
would it really be that much different from what's written into class patsy():
def init(formula, alpha, gamma, iota)
self.formula = formula
if alpha:
self.formula += f'I(shenanigans{alpha}'
...standardizing gamma as in [1,2,3,6,8,11] instead of [12,24,36,72,96,132]. this allows you to get rid of having to pass in the triangle. |
Not really, which is why I don't see the motivation to change it. It adds something to PatsyFormula that other estimators won't use, which, I suppose, isn't gonna hurt anything. I just don't see why we want this.
Sure why not? Can be done as a preprocessing step even.
For the most part, but PTF_formula needs some way of knowing the development grain. If we move to passing in a grain-agnostic development, then we need either the triangle or another parameter to specify the grain. If we're hiding PTF_formula somewhere in BarnettZehnwirth, then the latter option is just fine by me. Development grain can be harvested from the triangle before the formula is built. Thoughts on moving this to a branch? |
…thin BZ estimator (called during fit). Changed how gamma is passed: index from 0 and increment grain-agnostic. Docs not updated yet
thought about this some more. ultimately my idea of standardization doesn't actually accomplish anything. we don't have any other examples in this package of deriving an annual pattern that can then be applied to quarterly triangles. we went down this rabbit hole with dividing by dgrain because that's what the BZ method did. i'm starting to think that the BZ method should be leaving the developement vector in months. not only that, the BZ method should be leaving both origin and valuation in the actual years. otherwise the BZ method becomes very tedious to maintain from year to year, i.e. if you run a fixed-size triangle for your reserve review, you have to increment the BZ parameters every analysis. tl; dr: I'm starting to think we should evolve from the paper, and start defining our PTF parameters in their original magnitudes. i.e. origin buckets as [1977,1979,1980,1982], dev buckets as [12,24,36,60,84,132], val buckets as [1977,1984,1985,1988] this would require quite a bit of rework of the BZ method. i think we are all a little overstimulated on the BZ method. I'm good on approving this whenever you think the doc is done. |
|
Making future analysis easier provides a pretty good motivation to how dates are handled/indexed. Since we have pandas under the hood anyways, it might be worth using datetimeindices and offsets. We can leave that for the future, but it's worth noting here.
Heh yeah. I just wanted to leave things in a more "finished" state, and I doubt I'm alone in that. For now I will leave the authors' example where it is. The m3ir5 dataset would be a good choice for an analysis to replace it, if someone wishes to perform it. There is one bit of weirdness, however. Somehow, the outputs of the Parsimonious Modeling and PatsyFormula section of user_guide/development changed. This PR doesn't touch any code relevant to those sections and the development book hasn't been touched in 2 years, so I think this change (bug?) is caused by an older PR. |
could be pandas 3.0. i just pushed a change to address. see if your RTD instance fixes itself after you sync |
|
Well that's embarrassing. The PatsyFormula section fits a random forest, so things should change every time. |
henrydingliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed code changes. reviewed doc changes at https://danielfong-act-chainladder-python.readthedocs.io/en/latest/user_guide/development.html#barnettzehnwirth
|
@henrydingliu thanks, and thanks for sticking with this one so long. |
Documentation Editing Concerns
I'd like to start here, just in case I have to close this PR to rework everything. This all took some time, in part because of my concerns over how to edit the docs. I found that merely opening a notebook in jupyter lab (invoked by
uv run --with jupyter jupyter lab) immediately changed many parts of the file. While I can still build the docs locally, I'm worried that this will break things upstream. Editing the notebook using a text editor is cumbersome outside of small edits, and would have made adding cells incredibly annoying. I couldn't find a safe way to edit things otherwise, so I went ahead and used jupyter lab. Please tell me if there's a better way of editing the docs, or if I am worrying about nothing.EDIT: I built the docs here. This seems like a non-issue.
Changes
This PR is intended to put any BarnettZehnwirth related issues or concerns in the ground. The documentation related to this estimator in user_guide/development has been changed in the following ways:
In addition, I changed how gamma and iota are passed to PTF_formula, and changed test_barnzehn accordingly.
To be changed
Before merging I would like to address some things, including implementation considerations raised earlier.
If any discussion runs long or major changes are suggested, we could also create a new branch and iterate from there.