-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.qmd
More file actions
133 lines (108 loc) · 5.3 KB
/
Copy pathREADME.qmd
File metadata and controls
133 lines (108 loc) · 5.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
format:
gfm:
default-image-extension: ""
wrap: none
jupyter: python3
execute:
warning: false
echo: true
---
<!-- README.md is generated from README.qmd. Edit the .qmd and run `make readme`. -->
```{python}
#| include: false
import plotly.io as pio
pio.renderers.default = "png" # static plotly images for the README
```
# moderndive (Python)
<img src="https://raw.githubusercontent.com/moderndive/moderndive-python/main/doc/_static/moderndive-logo.png" align="right" height="160" alt="ModernDive hex logo" />
[](https://github.com/moderndive/moderndive-python/actions/workflows/tests.yml)
[](https://codecov.io/gh/moderndive/moderndive-python)
[](https://moderndive.readthedocs.io/en/latest/)
[](LICENSE)
The Python companion package for **ModernDive: Statistical Inference via Data
Science** — a faithful port of the R [`moderndive`](https://moderndive.github.io/moderndive/)
and [`infer`](https://infer.tidymodels.org) packages to a modern Python
data-science stack ([polars](https://pola.rs), [plotly](https://plotly.com/python/),
[plotnine](https://plotnine.org), [statsmodels](https://www.statsmodels.org)).
📖 **Documentation (with runnable examples): <https://moderndive.readthedocs.io>**
It is intentionally **pure-Python** (no compiled extensions) so it installs under
[Pyodide](https://pyodide.org) via `micropip` for in-browser execution.
## Installation
```bash
pip install moderndive # from PyPI (once published)
# or, from source:
pip install git+https://github.com/moderndive/moderndive-python
```
## What's inside
- **A tidy simulation-inference grammar** mirroring R `infer`:
`specify → hypothesize → generate → calculate`, plus `fit()` for multiple
regression, `observe()`, and `assume()` (theoretical t/z/F/Chisq). `specify()`
is also available as a DataFrame method, so you can write
`df.specify(...)` just like R's `df %>% specify(...)`. `calculate(stat=...)`
takes the full infer vocabulary **or any custom callable** test statistic.
Summaries via `get_p_value` / `get_confidence_interval` (percentile, SE,
bias-corrected); British-spelling and short aliases included.
- **Dual-engine plots**: `visualize` / `shade_p_value` / `shade_confidence_interval`
(and every plot helper) take `engine="plotly"` (default, interactive) or
`engine="plotnine"` — same code, your choice of output.
- **Theory-based wrapper tests**: `t_test`, `prop_test`, `chisq_test`,
`t_stat`, `chisq_stat`, plus the `moderndive.theory` module.
- **Regression & summary helpers** mirroring R `moderndive`: `get_regression_table`,
`get_regression_points`, `get_regression_summaries`, `get_correlation`,
`pop_sd`, `tidy_summary`, `count_missing` (built on `statsmodels` where
relevant, returning `polars` frames), plus the model plots
`gg_parallel_slopes` / `geom_parallel_slopes` and
`gg_categorical_model` / `geom_categorical_model`, and `pairplot`
(the `GGally::ggpairs` analog).
- **Sampling**: `rep_slice_sample` / `rep_sample_n` for sampling-distribution
activities.
- **58 datasets**: `load_*()` loaders returning `polars` DataFrames (the
`moderndive`/`infer`, `nycflights23`, `gapminder`, ISLR2, and FiveThirtyEight
datasets used in the book).
## Quick start
Are tracks more likely to be popular in *metal* than in *deep house*? Compute the
observed difference in "popular" rates, then permute the genre labels 1000 times
to build a null distribution and read off a p-value.
```{python}
import moderndive as md
from moderndive import get_p_value, visualize, shade_p_value
spotify = md.load_spotify_metal_deephouse()
# Observed difference in popularity rates (metal − deep house)
obs = (
spotify
.specify(formula="popular_or_not ~ track_genre", success="popular")
.calculate(stat="diff in props", order=("metal", "deep-house"))
)
obs
```
```{python}
# Permutation null distribution + p-value
null = (
spotify
.specify(formula="popular_or_not ~ track_genre", success="popular")
.hypothesize(null="independence")
.generate(reps=1000, type="permute", seed=76)
.calculate(stat="diff in props", order=("metal", "deep-house"))
)
print(get_p_value(null, obs_stat=obs, direction="right"))
```
```{python}
# Visualize — interactive plotly by default; engine="plotnine" for ggplot-style
visualize(null) + shade_p_value(obs_stat=obs, direction="right")
```
## Development
This repo uses [uv](https://docs.astral.sh/uv/).
```bash
uv sync --extra dev # create the environment
make test # run the test suite (enforces 100% coverage)
make readme # re-render README.md from README.qmd (needs Quarto)
make build-data # rebuild the bundled Parquet datasets (needs R; see tools/)
make build # build the wheel/sdist
```
The test suite is held at **100% statement coverage** (enforced in CI via
`--cov-fail-under=100`). Releases are automated on `v*` tags — see
[RELEASING.md](RELEASING.md).
## License
MIT. The ModernDive book *content* is licensed CC-BY-NC-SA 4.0; this *software
package* is MIT-licensed.