Skip to content

Latest commit

 

History

History
34 lines (24 loc) · 1.53 KB

File metadata and controls

34 lines (24 loc) · 1.53 KB

Quantifying Phonosemantic Iconicity Distributionally in 6 Languages

This repository has the code required to replicate the experiments reported on in the paper of the above title.

Quickstart

It's recommended you use a Conda environment for this, as otherwise package installation can become iffy, especially on macOS.

conda create -n quanticon_env python=3.11.13 -y
conda activate quanticon_env
conda install -c conda-forge fasttext -y
conda install pip -y
pip install -r requirements.txt
conda install ipykernel -y

Replication of any processes reported on in the paper can take place by running the appropriate cells in the notebook, which has annotations to guide you.

A full replication can take place by setting an OpenAI API key in .env and simply clicking "Run All" in the scripts/experiments.ipynb notebook.

Please note that the API use at the scale reported on in the paper costs some $15 USD. This cost can be mitigated by changing top_n_to_decompose in the "Setup" section of the notebook. You can set your API key with:

echo "OPENAI_API_KEY=your_api_key_here" > .env

The words.csv and roots.csv files were too large to include in the repository, but I'm sure I could transfer them via a fileshare service to anyone interested. If you are, or should you have any other questions/comments/concerns, contact (redacted for anonymity).

Used external assets

G2P (Apache 2.0 License)
Wordfreq (Apache 2.0 License)
Fasttext (MIT License)
Epitran (MIT License)