Yoon Mi Oh and François Pellegrino
This repo contains Supplementary Informaton for the paper "Towards robust complexity indices in linguistic typology: a corpus-based assessment".
data.txt: the txt file providing most of the information for each languagedataMC.txt: the values of four morphological complexity metrics (WID, TTR, MTLD, H) obtained by different corpus sampling configurations (Whole, 5, 10, 20, 40, and 60 subsets)allWID.txt: the average WID estimated from three different corpus configurations (WID_FP, WID_PP, and WID_NP)Figure 1.png: the image file for Figure 1 not generated by R codeFigure 12.png: the image file for Figure 12 not generated by R codesurprisal.txt: English surprisal estimated at the verse level with the lm-scorer package downloaded from https://github.com/simonepri/lm-scorer using the GPT-2 modelWID_PP_NP.txt: WID_PP and WID_NP calculated with randomized English surprisalrWID_PP_NP.txt: Spearman's correlation coefficient between WID_PP and WID_NPSupplementaryInfo.Rmd: the RMarkdown file incorporating the analysis code, the main results detailed in the paper and results from additional analysesSupplementaryInfo.html: the resulting HTML file