Optimal Stratified Sampling with Python
This package calculates optimal sampling rates for a stratified sample. The current version only has code for calculating sampling rates for an online panel. This includes a python file for the main function, as well as a test script. Code for other types of survey settings may come at a later date.
The package conducts numerical optimization to calculate sampling rates, similar to ideas from Valliant et al. (2018). The setting of this paper is a survey without 100% response rate, using a setup similar to Mendelson and Elliott (2024). For the online panel code, the full details of the model and computation details will be coming in a forthcoming working paper. The inputs of the function are various key parameters, like response rates and cost estimates. The algorithm then uses minimization routines from the SciPy package to find the number of sampled cases that results in the lower average standard error of the estimates, subject to the budget constraint.
- Python 3.11.5+
- Required packages (tested versions listed, should work on most others):
- numpy (1.24.3)
- scipy (1.11.1)
Once the release .whl file is downloaded and stored in <wheel location>, a simple
pip install <wheel location>
should work.
This code is from current Census Bureau research and is still being tested and refined. We appreciate any feedback you would like to provide us; please post any questions that you may have in the GitHub issues section.
Please cite this package in any work where it proves useful.
@software{Eggleston_Optimal_Sampling_2025,
author = {Eggleston, Jonathan},
title = {{Optimal Stratified Sampling with Python}},
url = {https://github.com/uscensusbureau/optimal_stratified_sampling/},
version = {1.0.0},
year = {2025}
}
U.S. Census Bureau code is provided on an ‘as is’ basis and the user assumes responsibility for its use. The Census Bureau has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any claims against the Census Bureau stemming from the use of its GitHub project will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by the Census Bureau. The Census Bureau seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by Census Bureau or the United States Government.
Any opinions and conclusions expressed herein are those of the author and do not represent the views of the U.S. Census Bureau.
Mendelson, Jonathan, and Michael R. Elliott. 2024. "Optimal Allocation Under Anticipated Nonresponse." Journal of Survey Statistics and Methodology 12 (5): 1405–29. Available at DOI
Valliant, Richard, Jill A. Dever, and Frauke Kreuter. 2018. Practical Tools for Designing and Weighting Survey Samples. 2nd ed. New York: Springer.