-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathMachineLearningPredictors.Rmd
More file actions
541 lines (300 loc) · 23.8 KB
/
MachineLearningPredictors.Rmd
File metadata and controls
541 lines (300 loc) · 23.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
---
title: '**Predictors of Climate Change Policy Support:<br> A Machine Learning Approach**'
author: "Dr. Fatih Uenal, Geneva, Switzerland"
output:
html_document:
toc: yes
toc_float: true
collapsed: false
toc_depth: 3
theme: lumen
highlight: tango
code_folding: show
fig_width: 12
fig_height: 8
---
```{r general-options-document-format, include=FALSE}
knitr::opts_chunk$set(tidy = 'styler', warning = F, message = F, echo = F, error = F) # Set knitting options general
options(tinytex.verbose = T, dplyr.summarise.inform = F) # Debugging & Suppress summary info
```
```{r setup, include=FALSE}
# ## Install/Load required packages
library(pacman)
pacman::p_load(essurvey, readr, readxl, curl, tidyverse, broom, haven, foreign, labelled, skimr, lares, GGally, ggplot2, PerformanceAnalytics, fastmatch, Metrics, ipred, mlbench, RANN, highcharter, ggimage, countrycode, survey, srvyr, dplyr, questionr, sjmisc, car, data.table, ryouready)
# Load ML data
#load(file = "ml_imageX.RData")
# Load visualization data
load(file = "ess8_june_2023.RData")
```
```{r echo = F, warning = F, message = F, error = F}
options(highcharter.theme = hc_theme_hcrt(tooltip = list(valueDecimals = 2)))
map_tax_support
```
# 1. Overview
**This webpage is a companion piece to the paper entitled: *Predictors of Climate Change Policy Support: A Machine Learning Approach*. A pre-print of the manuscript can be found [online]( https://psyarxiv.com/65tx4/).**
In the manuscript, we use machine learning techniques on a set of nationally representative surveys of 22 European countries and Israel, to determine the relative influence of a large number (151 predictors) of individual-level attitudes, beliefs, perceptions, and behaviors from six groups:
* [1] media and social trust,
* [2] politics,
* [3] subjective well-being, social exclusion, religion, national and ethnic identity,
* [4] attitudes towards climate change,
* [5] energy security and energy preferences,
* [6] welfare attitudes,
* [7] human values and
socio-demographic factors, as well as national-level indices of social, economic, ecological, and environmental development on climate change policy preferences.
Here, I present more fine-grained descriptive statistics, visualizing the relationship between various individual- and macro-level variables and the main variable of interest: Fossil fuel taxation support in more detail compared to the presentation in the manuscript.
* Following this **Overview** section, I will provide some contextual information on the data and main question of this research questions of the manuscript in the **Introduction** section.
* The following **Data Download & Preparation** section describes how to download and prepare the data-sets. The data set containing over 500 columns most of which are irrelevant for this project. In this section I therefore also clean the data by removing many of the unnecessary columns such as 'administrative' columns from the data-set and describe how I arrived at the final, cleaned data set.
* The last section **Exploratory Data Analysis (EDA)** presents the fine-grained descriptive statistics which I created to augment the results presented in the manuscript. They provide a more in-depth dive into the various datasets utilized in the manuscript.
**Pleas note that some of the following information is part of an academic manuscript which is currently in peer-review process.**
---
# 2. Introduction
Fossil-fueled climate change poses a significant threat to planetary and civilizational health (IPCC 2022; Kemp et al., 2022). Mitigating climate change requires enacting far-reaching climate policies such as restrictive supply-side policies (e.g., fossil-fuel non-proliferation treaty, limitations, moratoria, and bans; Newell & Simms, 2019) as well as regulatory demand-side policies (e.g., fossil-fuel taxation, incentivizing energy efficiency) that facilitate behavior changes towards low-carbon-intensive lifestyles. Demand-side policies that tax polluting sources of energy, such as fossil fuels, are among the most effective ways to curb emissions that harm the planet and human health (Carl & Fedor, 2016; Stoddard et al. 2021). Moreover, the income generated by fossil fuel taxation can be used to accelerate the low-carbon transition. However, 70% of energy-related CO2 emissions from advanced and emerging economies are entirely untaxed, offering little incentive to move to cleaner energy (OECD, 2019). Though the immense potential of fossil fuel taxation policies is undeniable, public opinion research also indicates that support for them varies greatly across Europe (e.g., Dechezleprêtre et al., 2022; Fairbrother, Sevä, & Kulin, 2019; Harring, Sverker, & Matti, 2019).
Prior research has significantly enriched our comprehension of the intricacy inherent in support for climate change policies. However, the available research has yet to identify the most important and robust factors associated with this support (or lack thereof) amongst the plethora of factors that have been investigated so far (but see Lee et al., 2015 for climate change risk perceptions). For example, analysis techniques (e.g., linear regression, latent class analysis, multi-level regression) typically used in previous research are suboptimal for comparing the predictive importance of a larger number of constructs simultaneously (Lee et al., 2015; Yarkoni & Westfal, 2017). These methods also readily permit the erroneous overfitting of statistical models to specific datasets. The consequence of these practical research constraints is that previous research has not been able to meaningfully compare the relative importance of the growing list of constructs associated with climate change mitigation policy preferences (e.g., Fairbrother, Sevä, & Kulin, 2019; Goldberg et al., 2021; Kàcha et al., 2022; Dechezleprêtre et al., 2022; Poortinga et al., 2019). Thus, at present, one of the key challenges to understanding the factors associated with climate change policy preferences is to make the existing knowledge cumulative by identifying the strongest and most robust predictors among the many. At the same time, relatively little research has explored cross-national differences in climate change mitigation policy support (but see, Dechezleprêtre et al., 2022; Fairbrother, Sevä, & Kulin, 2019; Uenal et al., 2020), which is important because the influence of factors may differ between countries.
Using a machine learning approach (i.e., random forest) on data from nationally representative samples from 22 European countries and Israel, in the manuscript, we provide the first large-scale assessment of the highest number of factors (151 predictors) underlying climate change mitigation policy preferences to date. The data come from the [European Social Survey 8 (2016)](https://www.europeansocialsurvey.org/data/download.html?r=8). The ESS8 contains a measure of climate change mitigation policy preferences by asking ‘To what extent are you in favor or against the following policies in [country] to reduce climate change: increasing fossil-fuel taxation?’. We group responses to this policy item into three categories: ‘Opposed’, ‘Undecided’, and ‘Support’. Therefore, this analysis identifies the best predictors of climate change mitigation policy support across the entire spectrum of preferences, including the ‘undecided’ category that represents a relatively large share of respondents ranging from 16.75% in Ireland to 33.70% in the Russian Federation.
In the manuscript, using a machine learning approach (i.e., random forest) on data from nationally representative samples from 22 European countries and Israel, we provide the first large-scale assessment of the highest number of factors (151 predictors) underlying climate change mitigation policy preferences to date. The data come from the European Social Survey Round 8 (ESS 8, European Social Survey, 2016). The ESS8 contains a measure of climate change mitigation policy preferences by asking ‘To what extent are you in favor or against the following policies in [country] to reduce climate change: increasing fossil-fuel taxation?’. We group responses to this policy item into three categories: ‘Opposed’, ‘Undecided’, and ‘Support’. Therefore, this analysis identifies the best predictors of climate change mitigation policy support across the entire spectrum of preferences, including the ‘undecided’ category that represents a relatively large share of respondents ranging from 16.75% in Ireland to 33.70% in the Russian Federation.
---
# 3. Data Download & Preparation
### 3.1 Data Access {.tabset}
#### Data Access Route 1
The data set is publicly available following the below link:
* [European Social Survey 8 (2016)](https://www.europeansocialsurvey.org/data/download.html?r=8)
#### Data Access Route 2
Alternatively to downloading from the above the link, one can also use the 'essurvey' package. A description on this package and its' usage is available in the below link:
* ['essurvey'](https://www.r-bloggers.com/2014/03/analyze-the-european-social-survey-ess-with-r/)
### 3.2 Download
For this project, I used the second method, and downloaded the data using the 'essurvey'. Please note that you will need to first set an account with ESS and than use your email address to be able to download the data if you choose to use the 'essurvey'. The code to download the data via the 'essurvey' package is in the accompanying R script. However, due to privacy concerns, I am not using this method in this project. Instead, the data is available in my github repository. Executing the current file should automatically download the data from my github. If that does not work, please download the data from github directly and load it locally (code provided in R script and rmd file but commented #).
```{r data-download, warning=FALSE, message=FALSE, echo=FALSE}
## Download ESS Data
# Set email for access
# set_email("ENTER YOUR EMAIL ADRESS HERE")
## Download all countries round 8
# df8 <- import_rounds(8)
## Load data file locally if you download the data from edX
df8 <- read.csv("data/ESS8e02.1_F1.csv")
## Download data file from my github
# df8 <- read.csv(curl('https://raw.githubusercontent.com/FUenal/Harvard_Capstone_Project/main/ESS8e02.1_F1.csv'))
```
*PLEASE NOTE: The code is included in the RMD file but explicitly tuned off (echo=FALSE)*.
### 3.3 Preparation
As mentioned in the **Overview** section, the raw data set contains many variables which are irrelevant to this project. As visible below, the raw data set contains 534 columns each representing one variable corresponding to a question asked in the survey or an administrative question and 44387 rows, each corresponding to one surveyed person.
```{r data-peak-1, warning=FALSE, message=FALSE, echo=FALSE}
## First peak into raw data
df_str(df8)
# df_str(df8, return = "skimr")
```
It is also visible that the raw data set contains many missing values as well as different types of variable (e.g., integer, character, etc.). It is clear at this point that data selection and cleaning will require studying the survey documentation prior to anything else. The Survey documentation is very detailed and thorough and can be accessed following this link:
* [ESS 8 Study Documentation](https://www.europeansocialsurvey.org/docs/round8/survey/ESS8_data_protocol_e01_4.pdf)
Having studied the data documentation, I decided to remove many columns from the data-set. I explain my choices below:
* Since I am interested in analyzing "climate change policy preferecens (fossil fuel taxation)" across countries and not for each country individually, the country specific variables are of no relevance for my goal. So I am going to remove the country specific variables all together.
* For similar reasons as above, I also don't need the sampling stratification weights and population weights and will remove those from the data set as well.
* The data set also contains a number of so-called 'administrative variables' such as interview time and date etc. which are not relevant for my goals and will thus be removed.
* The survey documentation also list several binary type variables which indicate no relevance to this projects main objective and I will thus remove those as well.
* Lastly, the survey contains many missing values which are coded in many different ways. For example, missing values refer to invalid answers (e.g. refusal, don't know, missing) which are encoded with numbers 66, 77, 99 for some features, and with 6, 7, 8, 9 for the other features. Generally, there are a lot of different ways of how the question encoding was designed. This may be the result of many different groups of people working on this survey. I will make use of the 'esssurvey' function to automatically label all invalid answers into NaN values. See documentation of this package in the link provided above.
Given the large amount of specific cleaning tasks, I manually went through the documentation, retrieved all the information needed for the cleaning and removed all irrelevant variables.
```{r data-peak-2, warning=FALSE, message=FALSE, echo=FALSE}
## From here on after the cleaned and smaller subset of the data will be used
## Load cleaned data set locally
df <- read.csv("data/ess8_april_2023.csv") # If you download the data from edX
## Download data file from my github
# df <- read.csv(curl('https://raw.githubusercontent.com/FUenal/Harvard_Capstone_Project/main/ess8_subsample.csv'))
## Random sample for testing purposes: You can choose to work with an even smaller subsample to have shorter run times
# df <- sample_n(df, 500)
## First peak into cleaned data
# df_str(df)
# df_str(df, return = "skimr")
```
As visible below, the cleaned data set now contain fewer missing values but still a high number of columns (151).
---
# 4. Exploratory Data Analysis (EDA)
## 4.1. Climate Change Policy Support Across Europe
Now, we'll first tale look at the averaged weighted percentage of climate change policy support and other variables present in the dataset by mapping them. How are European countries differing in terms of their climate change awareness, concern? What percentage of each countries population sees themselves as responsible for climate change? How high is the support for Climate Change Mitigation policies? And finally, what are the causal attributions of European Citizens in terms of identifying the cause of climate change (man-made vs. natural vs. denial)
### {.tabset}
#### Climate Change: Awareness
```{r}
map_aware
```
#### Climate Change: Threat Perceptions
```{r}
map_worried
```
#### Climate Change: Cause Attributions
```{r}
hw_grid(map_cause_humans, map_cause_nature, map_cause_denial,
ncol = 3)
```
#### Climate Change: Impact Valence
```{r}
hw_grid(map_bad_impact, map_good_impact,
ncol = 2)
```
#### Climate Change: Feeling Responsibilty
```{r}
hw_grid(map_responsible, map_not_responsible,
ncol = 2)
```
#### Climate Change: Support for Mitigation Policies
```{r}
hw_grid(map_tax_support, map_tax_opposition, map_tax_undecided,
ncol = 3)
```
## 4.4 National Level Predictors of Climate Change Policy Preferences
As we have seen in the previous visualizations, European citizens show differences in their level of support for climate change mitigation policies. Next, we'll start gaining some more in-depth insights by visualizing the relationship between our variable of main interest "climate change policy preferences (fossil fuel taxation)" and the available features from other datasets. Previous research has identified dozens of factors which are involved in shaping whether people support climate change policies or not. These feature can be divided into *National-level factors* such as economic, political and environmental factors and *Individual level factors* such as beliefs, values, ideologies etc. The data set contains many such variables, and I am presenting only a subset of these below.
We'll start by analyzing some *National-level factors*.
### Economic Factors {.tabset}
#### Gross Domestic Product (GDP)
```{r}
hc_plot_GDP
```
#### Socio-Economic Inequality (GINI Coefficient)
```{r}
hc_plot_Gini
```
### Environmental Factors {.tabset}
#### Environmental Performance (EPI)
```{r}
hc_plot_EPI
```
#### Environmental Health Index
```{r}
hc_plot_Environmental_Health
```
#### Ecosystem Vitality Index
```{r}
hc_plot_Ecosystem_Vitality
```
### Political Factors {.tabset}
#### EnvironMental Policy Stringency
```{r}
hc_plot_EnvironMental_Policy_Stringency
```
#### Green Party Seats Parliament
```{r}
hc_plot_Green_Party_Seats_Parliament
```
#### Number of Supply Side Restrcition Fossil-Fuel Policies
```{r}
hc_plot_Policy_total
```
### Risk Factors {.tabset}
#### Climate Risk Index (CRI)
```{r}
hc_plot_cri
```
#### Country Greenhous Gas Emissions (MTCO2e)
```{r}
hc_plot_MTCO2e
```
#### Domestic Material Consumption (DMC)
```{r}
hc_plot_DMC
```
#### Fossil-Fuel Share in overall Engery Consumption (Sub-energy %)
```{r}
hc_plot_fossil_fuel_share_energy_2019
```
## 4.5 Individual Level Predictors of Climate Change Policy Preferences
Next, we'll visualize the relationship between individual level factors such as values, beliefs, and ideology and our outcome variable, climate change policy preferences
### Human values {.tabset}
#### Environemntal Values
```{r}
hc_plot_impenv
```
#### Materialism
```{r}
hc_plot_imprich
```
#### Empathy and Sociality
```{r}
hc_plot_iphlppl
```
### Politics {.tabset}
#### Political Orientation
```{r}
hc_plot_1
```
#### Trust in the European Parliament
```{r}
hc_plot_trsteuparl
```
#### Immigration bad or good for country's economy
```{r}
hc_plot_immigr
```
### Welfare attitudes {.tabset}
#### Government should reduce differences in income levels
```{r}
hc_plot_equal
```
#### Attitudes toward a basic income scheme
```{r}
hc_plot_basinc
```
#### Risk of Unemployment
```{r}
hc_plot_lkuemp
```
### Media & Social Trust {.tabset}
#### Internet Usage
```{r}
hc_plot_intuse
```
#### News Consumption (in minutes per day)
```{r}
hc_plot_nwspol
```
#### General Trust in People
```{r}
hc_plot_ppltrst
```
### Social and Demographic Factors {.tabset}
#### Education
#### Income
#### Gender
# Bibliography
1. IPCC. (2022). Sixth Assessment Report: Working Group III: Climate Change 2022: Mitigation of Climate Change. Consulté à l’adresse
https://www.ipcc.ch/report/ar6/wg3/
2. Kemp, L., Xu, C., Depledge, J., Ebi, K. L., Gibbins, G., Kohler, T. A., ... & Lenton, T. M. (2022). Climate Endgame: Exploring catastrophic climate change scenarios.
Proceedings of the National Academy of Sciences, 119(34), e2108146119.
https://doi.org/10.1073/pnas.2108146119
3. Newell, P., & Simms, A. (2020). Towards a fossil fuel non-proliferation treaty. Climate Policy, 20(8), 1043-1054.
https://doi.org/10.1080/14693062.2019.1636759
4. Carl, J., & Fedor, D. (2016). Tracking global carbon revenues: a survey of carbon taxes versus cap-and-trade in the real world. Energy Policy 96, 50–77.
5. Stoddard, I., Anderson, K., Capstick, S., Carton, W., Depledge, J., Facer, K., ... Hultman, M. (2021). Three decades of climate mitigation: why haven’t we bent the global
emissions curve? Annual Review of Environment and Resources, 46, 653‐689.
6. OECD (2019), Taxing Energy Use 2019: Using Taxes for Climate Action, OECD Publishing, Paris, https://doi.org/10.1787/058ca239-en.
7. Dechezleprêtre, A., Fabre, A., Kruse, T., Planterose, B., Chico, A. S., & Stantcheva, S. (2022). Fighting climate change: International attitudes toward climate policies (No. w30265). National Bureau of Economic Research. http://www.nber.org/papers/w30265
8. Fairbrother, M., Sevä, I. J., & Kulin, J. (2019). Political trust and the relationship between climate change beliefs and support for fossil fuel taxes: Evidence from a survey of 23 European countries. Global Environmental Change, 59, 102003. https://doi.org/10.1016/j.gloenvcha.2019.102003
9. Harring, N., Jagers, S. C., & Matti, S. (2019). The significance of political culture, economic context and instrument type for climate policy support: a cross-national study. Climate Policy, 19(5), 636-650. https://doi.org/10.1080/14693062.2018.1547181
10. Bumann, S. (2021). What are the Determinants of Public Support for Climate Policies? A Review of the Empirical Literature. Review of Economics, 72(3), 213-228. https://doi.org/10.1515/roe-2021-0046
11. Drews, S., van den Bergh, J.C.J.M., 2016. What explains public support for climate po- licies? A review of empirical and experimental studies. Clim. Policy 16, 855–876.
12. Leiserowitz, A., (2006). Climate change risk perception and policy preferences: the role of affect, imagery, and values. Clim. Change 77, 45–72.
https://doi.org/10.1007/s10584-006-9059-9
13. Goldberg, M. H., Gustafson, A., Ballew, M. T., Rosenthal, S. A., & Leiserowitz, A. (2021). Identifying the most important predictors of support for climate policy in the
United States. Behavioural Public Policy, 5(4), 480-502. https://doi.org/10.1017/bpp.2020.39
14. Poortinga, W., Whitmarsh, L., Steg, L., Böhm, G., & Fisher, S. (2019). Climate change perceptions and their individual-level determinants: A cross-European analysis.
Global environmental change, 55, 25-35. https://doi.org/10.1016/j.gloenvcha.2019.01.007
15. Kácha, O., Vintr, J., & Brick, C. (2022). Four Europes: Climate change beliefs and attitudes predict behavior and policy preferences using a latent class analysis on 23
countries. Journal of Environmental Psychology, 81, 101815. https://doi.org/10.1016/j.jenvp.2022.101815
16. Uenal, F., Sidanius, J., & van der Linden, S. (2022). Social and ecological dominance orientations: Two sides of the same coin? Social and ecological dominance
orientations predict decreased support for climate change mitigation policies. Group Processes & Intergroup Relations, 25(6), 1555-1576.
https://doi.org/10.1177/13684302211010923
17. Lee, T. M., Markowitz, E. M., Howe, P. D., Ko, C. Y., & Leiserowitz, A. A. (2015). Predictors of public climate change awareness and risk perception around the world.
Nature climate change, 5(11), 1014-1020. https://doi.org/10.1038/nclimate2728
18. Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6),
1100-1122. https://doi.org/10.1177/1745691617693393
19. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. [DOI](https://link.springer.com/article/10.1023%2FA%3A1010933404324)
20. Gelfand, M. J., Raver, J. L., Nishii, L., Leslie, L. M., Lun, J., Lim, B. C., ... Yamaguchi, S. (2011). Differences between tight and loose cultures: A 33-nation study. Science, 332(6033), 1100–1104. [DOI](https://science.sciencemag.org/content/332/6033/1100)
21. Kunst, J. R., Fischer, R., Sidanius, J., & Thomsen, L. (2017). Preferences for group dominance track and mediate the effects of macro-level social inequality and violence across societies. PNAS, 114(21), 5407–5412. [DOI](https://www.pnas.org/content/114/21/5407)
22. Sheehy-Skeffington, J., & Thomsen, L. (2020, April 1). Egalitarianism: psychological and socio-ecological foundations. Current Opinion in Psychology. Elsevier B.V.
[DOI](https://www.sciencedirect.com/science/article/pii/S2352250X1930137X?via%3Dihub)
23. Sidanius, J., & Pratto, F. (1999). Social Dominance. Social Dominance. Cambridge University Press.
[DOI](https://www.cambridge.org/core/books/social-dominance/ADA29C256881001463D6E2777404DB95)
# Appendix
## Session Info
```{r message=FALSE, warning=FALSE, echo=FALSE}
sessionInfo(package = NULL)
```
## Benchmark:
| Machine | Time |
|------------------------|--------:|
| MacBook Pro 8GB | 25'41 |