This repository hosts code to replicate and explore the analysis and results in the paper "An Open-Source Cultural Consensus Approach to Name-Based Gender Classification" (arXiv, ICWSM). Data supporting this work can be found here and an associated python package to make name-based gender classifications in practice here.
The notebook reproduce_analysis.ipynb reproduces the results and figures in the paper. Earlier stages of analysis are captured in the notebooks construct_data_resources.ipynb and aggregate_name_data.ipynb. One way to go about replication is as follows, beginning in terminal:
- % git clone https://github.com/ianvanbuskirk/nbgc.git
- % cd nbgc
- % python3 -m venv .venv
- % source .venv/bin/activate
- % pip install -r requirements.txt
- % python -m ipykernel install --user --name=nbgc
- % jupyter notebook
The above clones this repository, creates a virtual environment and installs required packages, adds this environment as a jupyter notebook kernel and starts the notebook server. Next, navigate to and open reproduce_analysis.ipynb, ensure that the nbgc kernel is active, and run all cells.
To cite this work, please use this bibtex entry:
@article{van2022open,
title={An Open-Source Cultural Consensus Approach to Name-Based Gender Classification},
author={Van Buskirk, Ian and Clauset, Aaron and Larremore, Daniel B},
journal={arXiv preprint arXiv:2208.01714},
note = {\url{https://github.com/ianvanbuskirk/nbgc}},
year={2022}
}