| title | The Art of Modeling. An Introduction to Linear Mixed Effects Models with R. |
|---|---|
| author | Klaus Frieler, Max Planck Institute for Empirical Aesthetics, Frankfurt/Main |
| date | 02.05.2025 |
Materials for a Linear Mixed Effects Models Workshop
The project includes several data sets that are used during the workshop.
Central are simulated data, which can be generated using simulate_lmm() in simulate_lmm.R.
The generated data are coming from a fully crossed design of n_rater ratings for n_item items
with respect to liking. The items are presented in two conditions, audio-only
(AO) and audio-visual (AV). The effect of condition on liking can be set be with the parameter fixef_beta,
the residual error with the parameter eps, the global intercept with y0.
The random effects are set with the parameter ranef which needs to be a nested list.
The parameter list has three lists with names intercept, beta, and cor on the top level, which
need to be lists with entries ratersand ìtems, containing the corresponding random effects parameters.
| Variable | Type | Values |
|---|---|---|
| rater | factor | [ r01 | r02 | r03 | r04 | ... ] |
| item | factor | [ i01 | i02 | i03 | i04 | ... ] |
| condition | factor | [ AO | AV ] |
| betas | numeric | |
| y1 | numeric | |
| liking | numeric |
This is an abridged version of the original data from Lange & Frieler, 2016.
The data come from an experiment where 20 audio engineers
(column p_id) rated 60 items (column s_id) on several scales.
The data set is filtered to included only cases which have a
definite tempo (column MIR.tempo), which are 53 in total.
The main ratings were perceived emotional expressions (columns angry, sad, happy, peacef, tender, fearf) on a scale from 1 to 7.
These were aggregated with factor analysis to two variables representing valence and arousal
(columns AROUSAL and VALENCE.)
Besides that, for each item there are 11 audio features extracted, where MIR.tempo was manually annotated (see table below).
Finally, there is a categorial variable (column emo), which contains a emotion label, derived as the emotion variable
with the highest mean across participant for this item.
| Variable | Type | Values |
|---|---|---|
| p_id | integer | 1–20 |
| s_id | integer | 1–60 |
| AROUSAL | numeric | |
| VALENCE | numeric | |
| angry | integer | 1–7 |
| sad | integer | 1–7 |
| happy | integer | 1–7 |
| peacef | integer | 1–7 |
| tender | integer | 1–7 |
| fearf | integer | 1–7 |
| MIR.low_energy_mean | numeric | |
| MIR.pitch_mean | numeric | |
| MIR.rms_mean | numeric | |
| MIR.regularity_mean | numeric | |
| MIR.keyclarity_mean | numeric | |
| MIR.mode_mean | numeric | |
| MIR.pulse_clarity_mean | numeric | |
| MIR.spectral_novelty_mean | numeric | |
| MIR.mode_std | numeric | |
| MIR.attacktime_std | numeric | |
| MIR.tempo | numeric | |
| emo | factor | [ angry | fearf | happy | peacef | tender | sad ] |
This is a selection of the Covox dataset (Bruder et al., 2024). The data comes from a rating experiment, where 60 raters
(column rater), assessed 330 items (column item) in total which respect to liking (column score).
The items were short monophonic audio excerpts by 22 singers which rendered 3 different melodies (column melody) in
5 different styles (column style), three singing styles (opera, pop, and lullaby) and two speech styles (infant directed ID and
adult directed AD, see also column speech).
Further, all participants rated all items on two different occasions (column time).
| Variable | Type | Values |
|---|---|---|
| singer | factor | [ singer01 | singer02 | singer03 | singer04 | ... ] |
| rater | character | [ r01 | r02 | r03 | r04 &vert& ... ] |
| style | factor | [ AD | ID | lullaby | opera | pop ] |
| melody | factor | [ CH | MS | NN ] |
| time | factor | [ T1 | T2 ] |
| score | integer | 1–9 |
| item | factor | [ singer01_CH_AD | singer01_CH_ID | singer01_CH_lullaby | singer01_CH_opera | ... ] |
| speech | logical | [TRUE | FALSE] |
This is a subset of the raw data from Will et al. 2024. In this experiment, 188 participants from five different age groups (column age_group),
had to tap to a beat in two different settings and for two different tempos. Precision (precision) was measured as the negative
logarithm of the standard deviation (column log_sd_ioi) of the produced inter-beat intervals (the higher, the better).
Furthermore, beat perception (beat_perc) and production (beat_prod) skills were assessed with a different test battery.
| Variable | Type | Values |
|---|---|---|
| p_id | factor | [ i_e_05_02 | i_e_06_02 | i_e_07_02 | i_e_09_02 | ... ] |
| age_group | factor | [ 5yo | 6yo | 7yo | 8yo | adults ] |
| setting | factor | [ metronome | social ] |
| experimenter | factor | [ i | l | w ] |
| tempo | factor | [ fast | slow ] |
| beat_perc | numeric | |
| beat_prod | numeric | |
| log_sd_ioi | numeric | |
| beat_ability | numeric | |
| precision | numeric |
This is simulated data with heteroscedastic and measurement error from a study by Müllensiefen & Frieler (in preparation).
Outcome is y, with two predictors x and z. True coefficients
are intercept = 1, beta_x = .4, and beta_z = .2. The original values, before applying measurement errors xe, ye, and ze', are in columns xt, ytandzt'. The level of measurement error can be high or low.
| Variable | Type | Values |
|---|---|---|
| y | numeric | |
| x | numeric | |
| xe | numeric | |
| xt | numeric | |
| zt | numeric | |
| ze | numeric | |
| z | numeric | |
| p_id | integer | 1–50 |
| me | character | [ low | vert ] |