surveys | Reprex

Where Are People More Likely To Treat Climate Change as the Most Serious Global Problem?

Sat, 06 Mar 2021 00:00:00 +0000

library(regions)
library(lubridate)
library(dplyr)
if ( dir.exists('data-raw') ) {
data_raw_dir <- "data-raw"
} else {
data_raw_dir <- file.path("..", "..", "data-raw")
}

The first results of our longitudinal table were difficult to map, because the surveys used an obsolete regional coding. We will adjust the wrong coding, when possible, and join the data with the European Environment Agency’s (EEA) Air Quality e-Reporting (AQ e-Reporting) data on environmental pollution. We recoded the annual level for every available reporting stations [not shown here] and all values are in μg/m3. The period under observation is 2014-2016. Data file: https://www.eea.europa.eu/data-and-maps/data/aqereporting-8 (European Environment Agency 2021).

Recoding the Regions

Recoding means that the boundaries are unchanged, but the country changed the names and codes of regions because there were other boundary changes which did not affect our observation unit. We explain the problem and the solution in greater detail in our tutorial that aggregates the data on regional levels.

panel <- readRDS((file.path(data_raw_dir, "climate-panel.rds")))
climate_data_geocode <- panel %>%
mutate ( year: lubridate::year(date_of_interview)) %>%
recode_nuts()

Let’s join the air pollution data and join it by corrected geocodes:

load(file.path("data", "air_pollutants.rda")) ## good practice to use system-independent file.path
climate_awareness_air <- climate_data_geocode %>%
rename ( region_nuts_codes : .data$code_2016) %>%
left_join ( air_pollutants, by: "region_nuts_codes" ) %>%
select ( -all_of(c("w1", "wex", "date_of_interview",
"typology", "typology_change", "geo", "region"))) %>%
mutate (
# remove special labels and create NA_numeric_
age_education: retroharmonize::as_numeric(age_education)) %>%
mutate_if ( is.character, as.factor) %>%
mutate (
# we only have responses from 4 years, and this should be treated as a categorical variable
year: as.factor(year)
) %>%
filter ( complete.cases(.) )

The climate_awareness_air data frame contains the answers of 75086 individual respondents. 17.07% thought that climate change was the most serious world problem and 33.6% mentioned climate change as one of the three most important global problems.

summary ( climate_awareness_air )
## rowid serious_world_problems_first
## ZA5877_v2-0-0_1 : 1 Min. :0.0000
## ZA5877_v2-0-0_10 : 1 1st Qu.:0.0000
## ZA5877_v2-0-0_100 : 1 Median :0.0000
## ZA5877_v2-0-0_1000 : 1 Mean :0.1707
## ZA5877_v2-0-0_10000: 1 3rd Qu.:0.0000
## ZA5877_v2-0-0_10001: 1 Max. :1.0000
## (Other) :75080
## serious_world_problems_climate_change isocntry
## Min. :0.000 BE : 3028
## 1st Qu.:0.000 CZ : 3023
## Median :0.000 NL : 3019
## Mean :0.336 SK : 3000
## 3rd Qu.:1.000 SE : 2980
## Max. :1.000 DE-W : 2978
## (Other):57058
## marital_status age_education
## (Re-)Married: without children :13242 18 :15485
## (Re-)Married: children this marriage :12696 19 : 7728
## Single: without children : 7650 16 : 5840
## (Re-)Married: w children of this marriage: 6520 still studying: 5098
## (Re-)Married: living without children : 6225 17 : 5092
## Single: living without children : 4102 15 : 4528
## (Other) :24651 (Other) :31315
## age_exact occupation_of_respondent
## Min. :15.0 Retired, unable to work :22911
## 1st Qu.:36.0 Skilled manual worker : 6774
## Median :51.0 Employed position, at desk : 6716
## Mean :50.1 Employed position, service job: 5624
## 3rd Qu.:65.0 Middle management, etc. : 5252
## Max. :99.0 Student : 5098
## (Other) :22711
## occupation_of_respondent_recoded
## Employed (10-18 in d15a) :32763
## Not working (1-4 in d15a) :37125
## Self-employed (5-9 in d15a): 5198
##
##
##
##
## respondent_occupation_scale_c_14
## Retired (4 in d15a) :22911
## Manual workers (15 to 18 in d15a) :15269
## Other white collars (13 or 14 in d15a): 9203
## Managers (10 to 12 in d15a) : 8291
## Self-employed (5 to 9 in d15a) : 5198
## Students (2 in d15a) : 5098
## (Other) : 9116
## type_of_community is_student no_education
## DK : 34 Min. :0.0000 Min. :0.000000
## Large town :20939 1st Qu.:0.0000 1st Qu.:0.000000
## Rural area or village :24686 Median :0.0000 Median :0.000000
## Small or middle sized town: 9850 Mean :0.0679 Mean :0.008151
## Small/middle town :19577 3rd Qu.:0.0000 3rd Qu.:0.000000
## Max. :1.0000 Max. :1.000000
##
## education year region_nuts_codes country_code
## Min. :14.00 2013:25103 LU : 1432 DE : 4531
## 1st Qu.:17.00 2015: 0 MT : 1398 GB : 3538
## Median :18.00 2017:25053 CY : 1192 BE : 3028
## Mean :19.61 2019:24930 SK02 : 1053 CZ : 3023
## 3rd Qu.:22.00 EL30 : 974 NL : 3019
## Max. :30.00 EE : 973 SK : 3000
## (Other):68064 (Other):54947
## pm2_5 pm10 o3 BaP
## Min. : 2.109 Min. : 5.883 Min. : 66.37 Min. :0.0102
## 1st Qu.: 9.374 1st Qu.: 28.326 1st Qu.: 90.89 1st Qu.:0.1779
## Median :11.866 Median : 33.673 Median :102.81 Median :0.4105
## Mean :12.954 Mean : 38.637 Mean :101.49 Mean :0.8759
## 3rd Qu.:15.890 3rd Qu.: 49.488 3rd Qu.:110.73 3rd Qu.:1.0692
## Max. :41.293 Max. :123.239 Max. :141.04 Max. :7.8050
##
## so2 ap_pc1 ap_pc2 ap_pc3
## Min. : 0.0000 Min. :-4.6669 Min. :-2.21851 Min. :-2.1007
## 1st Qu.: 0.0000 1st Qu.:-0.4624 1st Qu.:-0.49130 1st Qu.:-0.5695
## Median : 0.0000 Median : 0.4263 Median : 0.02902 Median :-0.1113
## Mean : 0.1032 Mean : 0.1031 Mean : 0.04166 Mean :-0.1746
## 3rd Qu.: 0.0000 3rd Qu.: 0.9748 3rd Qu.: 0.57416 3rd Qu.: 0.3309
## Max. :42.5325 Max. : 2.0344 Max. : 3.25841 Max. : 4.1615
##
## ap_pc4 ap_pc5
## Min. :-1.7387 Min. :-2.75079
## 1st Qu.:-0.1669 1st Qu.:-0.18748
## Median : 0.0371 Median : 0.01811
## Mean : 0.1154 Mean : 0.06797
## 3rd Qu.: 0.3050 3rd Qu.: 0.34937
## Max. : 3.2476 Max. : 1.42816
##

Let’s see a simple CART tree! We remove the regional codes, because there are very serious differences among regional climate awareness. These differences, together with education level, and the year we are talking about, are the most important predictors of thinking about climate change as the most important global problem in Europe.

# Classification Tree with rpart
library(rpart)
# grow tree
fit <- rpart(as.factor(serious_world_problems_first) ~ .,
method="class", data=climate_awareness_air %>%
select ( - all_of(c("rowid", "region_nuts_codes"))),
control: rpart.control(cp: 0.005))
printcp(fit) # display the results
##
## Classification tree:
## rpart(formula: as.factor(serious_world_problems_first) ~ .,
## data: climate_awareness_air %>% select(-all_of(c("rowid",
## "region_nuts_codes"))), method: "class", control: rpart.control(cp: 0.005))
##
## Variables actually used in tree construction:
## [1] age_education isocntry
## [3] serious_world_problems_climate_change year
##
## Root node error: 12817/75086: 0.1707
##
## n= 75086
##
## CP nsplit rel error xerror xstd
## 1 0.0240566 0 1.00000 1.00000 0.0080438
## 2 0.0082703 3 0.92783 0.92783 0.0078055
## 3 0.0050000 5 0.91129 0.91425 0.0077588
plotcp(fit) # visualize cross-validation results

summary(fit) # detailed summary of splits
## Call:
## rpart(formula: as.factor(serious_world_problems_first) ~ .,
## data: climate_awareness_air %>% select(-all_of(c("rowid",
## "region_nuts_codes"))), method: "class", control: rpart.control(cp: 0.005))
## n= 75086
##
## CP nsplit rel error xerror xstd
## 1 0.024056592 0 1.0000000 1.0000000 0.008043837
## 2 0.008270266 3 0.9278302 0.9278302 0.007805478
## 3 0.005000000 5 0.9112897 0.9142545 0.007758824
##
## Variable importance
## serious_world_problems_climate_change isocntry
## 31 26
## country_code BaP
## 20 8
## pm2_5 ap_pc1
## 4 3
## age_education pm10
## 2 2
## education ap_pc2
## 2 1
## year
## 1
##
## Node number 1: 75086 observations, complexity param=0.02405659
## predicted class=0 expected loss=0.1706976 P(node): 1
## class counts: 62269 12817
## probabilities: 0.829 0.171
## left son=2 (25229 obs) right son=3 (49857 obs)
## Primary splits:
## serious_world_problems_climate_change < 0.5 to the right, improve=2214.2040, (0 missing)
## isocntry splits as RRLLLRRRLLRLRLLLLLLLLLLRRLLLRLL, improve= 728.0160, (0 missing)
## country_code splits as RRLLLRRLLRLLLLLLLLLLRRLLLRLL, improve= 673.3656, (0 missing)
## BaP < 0.4300347 to the right, improve= 310.6229, (0 missing)
## pm2_5 < 13.38264 to the right, improve= 296.4013, (0 missing)
## Surrogate splits:
## age_education splits as ----RRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRL-RRR-RRRRRRRRR--RRRLLR--R-R, agree=0.664, adj=0, (0 split)
## pm10 < 7.491315 to the left, agree=0.664, adj=0, (0 split)
##
## Node number 2: 25229 observations
## predicted class=0 expected loss=0 P(node): 0.3360014
## class counts: 25229 0
## probabilities: 1.000 0.000
##
## Node number 3: 49857 observations, complexity param=0.02405659
## predicted class=0 expected loss=0.2570752 P(node): 0.6639986
## class counts: 37040 12817
## probabilities: 0.743 0.257
## left son=6 (34631 obs) right son=7 (15226 obs)
## Primary splits:
## isocntry splits as RRLLLRRRLLRLRLLLLLLLLLLRRLLLRLL, improve=1454.9460, (0 missing)
## country_code splits as RRLLLRRLLRLLLLLLLLLLRRLLLRLL, improve=1359.7210, (0 missing)
## BaP < 0.4300347 to the right, improve= 629.8844, (0 missing)
## pm2_5 < 13.38264 to the right, improve= 555.7484, (0 missing)
## ap_pc1 < -0.005459537 to the left, improve= 533.3579, (0 missing)
## Surrogate splits:
## country_code splits as RRLLLRRLLRLLLLLLLLLLRRLLLRLL, agree=0.987, adj=0.957, (0 split)
## BaP < 0.1749425 to the right, agree=0.775, adj=0.264, (0 split)
## pm2_5 < 5.206993 to the right, agree=0.737, adj=0.140, (0 split)
## ap_pc1 < 1.405527 to the left, agree=0.733, adj=0.126, (0 split)
## pm10 < 25.31211 to the right, agree=0.718, adj=0.076, (0 split)
##
## Node number 6: 34631 observations
## predicted class=0 expected loss=0.1769802 P(node): 0.4612178
## class counts: 28502 6129
## probabilities: 0.823 0.177
##
## Node number 7: 15226 observations, complexity param=0.02405659
## predicted class=0 expected loss=0.4392487 P(node): 0.2027808
## class counts: 8538 6688
## probabilities: 0.561 0.439
## left son=14 (11607 obs) right son=15 (3619 obs)
## Primary splits:
## isocntry splits as LL---LLR--L-L----------LL---R--, improve=337.5462, (0 missing)
## country_code splits as LL---LR--L-L--------LL---R--, improve=337.5462, (0 missing)
## age_education splits as ----LLLLLL-LLLRRRRRRR-RRRRRRRRRL-RRRRRRLLRR-RRRRLLRLRL-RRLRRR-RRR-LLLLRRR-----LR-----L-R, improve=294.0807, (0 missing)
## education < 22.5 to the left, improve=262.3747, (0 missing)
## BaP < 0.053328 to the right, improve=232.7043, (0 missing)
## Surrogate splits:
## BaP < 0.053328 to the right, agree=0.878, adj=0.485, (0 split)
## pm2_5 < 4.810361 to the right, agree=0.827, adj=0.271, (0 split)
## ap_pc2 < 0.8746175 to the left, agree=0.792, adj=0.124, (0 split)
## so2 < 0.3302972 to the left, agree=0.781, adj=0.078, (0 split)
## age_education splits as ----LLLLLL-LLLLLLLRLR-LRRLRRRRRR-RRRRLLLLLR-LRLRLLRRLL-LLRLLR-LLR-RRLLLLL-----RR-----R-L, agree=0.779, adj=0.071, (0 split)
##
## Node number 14: 11607 observations, complexity param=0.008270266
## predicted class=0 expected loss=0.3804601 P(node): 0.1545827
## class counts: 7191 4416
## probabilities: 0.620 0.380
## left son=28 (7462 obs) right son=29 (4145 obs)
## Primary splits:
## age_education splits as ----LLLLLL-LRRRRRRRRR-RRLRRLRRLL-RRRRLRLLRR-RLRLLLRLRL-RR-RR--RRL-L-LLRRR------------L-R, improve=123.71070, (0 missing)
## year splits as R-LR, improve=107.79460, (0 missing)
## education < 20.5 to the left, improve= 90.28724, (0 missing)
## occupation_of_respondent splits as LRRLRRRRRLRLLLRLLL, improve= 84.62865, (0 missing)
## respondent_occupation_scale_c_14 splits as LRLLLRRL, improve= 68.88653, (0 missing)
## Surrogate splits:
## education < 20.5 to the left, agree=0.950, adj=0.861, (0 split)
## occupation_of_respondent splits as LLLLRLLRRLRLLLRLLL, agree=0.738, adj=0.267, (0 split)
## respondent_occupation_scale_c_14 splits as LRLLLLRL, agree=0.733, adj=0.251, (0 split)
## is_student < 0.5 to the left, agree=0.709, adj=0.186, (0 split)
## age_exact < 23.5 to the right, agree=0.676, adj=0.094, (0 split)
##
## Node number 15: 3619 observations
## predicted class=1 expected loss=0.3722023 P(node): 0.04819807
## class counts: 1347 2272
## probabilities: 0.372 0.628
##
## Node number 28: 7462 observations
## predicted class=0 expected loss=0.326052 P(node): 0.09937938
## class counts: 5029 2433
## probabilities: 0.674 0.326
##
## Node number 29: 4145 observations, complexity param=0.008270266
## predicted class=0 expected loss=0.4784077 P(node): 0.05520337
## class counts: 2162 1983
## probabilities: 0.522 0.478
## left son=58 (2573 obs) right son=59 (1572 obs)
## Primary splits:
## year splits as L-LR, improve=40.13885, (0 missing)
## occupation_of_respondent splits as LRLLRRRRRLRLLLRLLL, improve=18.33254, (0 missing)
## marital_status splits as LRRRLRRRLRRLRLLRRRRRRLRLRLLRR, improve=17.86888, (0 missing)
## type_of_community splits as LRLRL, improve=17.55254, (0 missing)
## age_education splits as ------------LLRRRRRRR-RR-RL-RR---LRRR-R--LR-R-R---R-R--RR-RR--RR------RRR--------------R, improve=14.66121, (0 missing)
## Surrogate splits:
## type_of_community splits as LLLRL, agree=0.777, adj=0.412, (0 split)
## marital_status splits as RRLLLLLRLLLLLLLRRRLLLLLLRLRLL, agree=0.680, adj=0.155, (0 split)
## isocntry splits as LL---LL---L-R----------LL------, agree=0.669, adj=0.127, (0 split)
## country_code splits as LL---L---L-R--------LL------, agree=0.669, adj=0.127, (0 split)
## o3 < 83.06345 to the right, agree=0.650, adj=0.076, (0 split)
##
## Node number 58: 2573 observations
## predicted class=0 expected loss=0.4240187 P(node): 0.03426737
## class counts: 1482 1091
## probabilities: 0.576 0.424
##
## Node number 59: 1572 observations
## predicted class=1 expected loss=0.43257 P(node): 0.02093599
## class counts: 680 892
## probabilities: 0.433 0.567
# plot tree
plot(fit, uniform=TRUE,
main="Classification Tree: Climate Change Is The Most Serious Threat")
text(fit, use.n=TRUE, all=TRUE, cex=.8)
## Warning in labels.rpart(x, minlength: minlength): more than 52 levels in a
## predicting factor, truncated for printout

saveRDS ( climate_awareness_air , file.path(tempdir(), "climate_panel_recoded.rds"), version: 2)
# not evaluated
saveRDS( climate_awareness_air, file: file.path("data-raw", "climate-panel_recoded.rds"))

What is Retrospective Survey Harmonization?

Thu, 04 Mar 2021 00:00:00 +0000

Reproducible ex post harmonization of survey microdata

Retrospective survey harmonization allows the comparison of opinion poll data conducted in different countries or time. In this example we are working with data from surveys that were ex ante harmonized to a certain degree – in our tutorials we are choosing questions that were asked in the same way in many natural languages. For example, you can compare what percentage of the European people in various countries, provinces and regions thought climate change was a serious world problem back in 2013, 2015, 2017 and 2019.

We developed the retroharmonize R package to help this process. We have tested the package with about 80 Eurobarometer, 5 Afrobarometer survey files extensively, and a bit with Arabbarometer files. This allows the comparison of various survey answers in about 70 countries. This policy-oriented survey programs were designed to be harmonized to a certain degree, but their ex post harmonization is still necessary, challenging and errorprone. Retrospective harmonization includes harmonization of the different coding used for questions and answer options, post-stratification weights, and using different file formats.

Eurobarometer, Afrobaromer, Arab Barometer and Latinobarómetro make survey files that are harmonized across countries available for research with various terms. Our retroharmonize is not affiliated with them, and to run our examples, you must visit their websites, carefully read their terms, agree to them, and download their data yourself. What we add as a value is that we help to connect their files across time (from different years) or across these programs.

The survey programs mentioned above publish their data in the proprietary SPSS format. This file format can be imported and translated to R objects with the haven package; however, we needed to re-design haven’s labelled_spss class to maintain far more metadata, which, in turn, a modification of the labelled class. The haven package was designed and tested with data stored in individual SPSS files.

The author of labelled, Joseph Larmarange describes two main approaches to work with labelled data, such as SPSS’s method to store categorical data in the Introduction to labelled.

Two main approaches of labelled data conversion.

Our approach is a further extension of Approach B. Survey harmonization in our case always means the joining data from several SPSS files, which requires a consistent coding among several data sources. This means that data cleaning and recoding must take place before conversion to factors, character or numeric vectors. This is particularly important with factor data (and their simple character conversions) and numeric data that occasionally contains labels, for example, to describe the reason why certain data is missing. Our tutorial vignette labelled_spss_survey gives you more information about this.

In the next series of tutorials, we will deal with an array of problems. These are not for the faint heart – you need to have a solid intermediate level of R to follow.

Tidy, joined survey data

The original files identifiers may not be unique, we have to create new, truly unique identifiers. Weighting may not be straightforward.
Neither the number of observations or the number of variables (which represents the survey questions and their translation to coded data) is the same. Certain data may be only present in one survey and not the other. This means that you will likely to run loops on lists and not data.frames, but eventually you must carefully join them.

Class conversion

Similar questions may be imported from a non-native R format, in our case, from an SPSS files, in an inconsistent manner. SPSS’s variable formats cannot be translated unambiguously to R classes. retroharmonize introduced a new S3 class system that handles this problem, but eventually you will have to choose if you want to see a numeric or character coding of each categorical variable.
The harmonized surveys, with harmonized variable names and harmonized value labels, must be brought to consistent R representations (most statistical functions will only work on numeric, factor or character data) and carefully joined into a single data table for analysis.

Harmonization of variables and variable labels

Same variables may come with dissimilar variable names and variable labels. It may be a challenge to match age with age. We need to harmonize the names of variables.
The harmonized variables may have different labeling. One may call refused answers as declined and the other refusal. On a simple choice, climate change may be ‘Climate change’ or Problem: Climate change. Binary choices may have survey-specific coding conventions. Value labels must be harmonized. There are good tools to do this in a single file - but we have to work with several of them.

Missing value harmonization

There are likely to be various types of missing values. Working with missing values is probably where most human judgment is needed. Why are some answers missing: was the question not asked in some questionnaires? Is there a coding error? Did the respondent refuse the question, or sad that she did not have an answer? retroharmonize has a special labeled vector type that retains this information from the raw data, if it is present, but you must make the judgment yourself – in R, eventually you will either create a missing category, or use NA_character_ or NA_real_.

That’s a lot to put on your plate.

It is unlikely that you will be able to work with completely unfamiliar survey programs if you do not have a strong intermediate level of R. Our package comes with tutorials for Eurobarometer, Afrobarometer and our development version already covers Arab Barometer, highlighting some peculiar issues with these survey programs, that we hope to give a head start for less experienced R users.

Eurobarometer Surveys Used In Our Project

Wed, 03 Mar 2021 00:00:00 +0000

In our tutorial series, we are going to harmonize the following questionnaire items from five Eurobarometer harmonized survey files. The Eurobarometer survey files are harmonized across countries, but they are only partially harmonized in time.

All data must be downloaded from the GESIS Data Archive in Cologne. We are not affiliated with GESIS and you must read and accept their terms to use the data.

Eurobarometer 80.2 (2013)

GESIS Data Archive, Cologne. ZA5877 Data file Version 2.0.0, https://doi.org/10.4232/1.12792

Data file: ZA6595 data file (European Commission 2017).
Questionnaire: Eurobarometer 83.4 Basic Bilingual Questionnaire
Citation: ZA6595 Bibtex

QA1a Which of the following do you consider to be the single most serious problem facing the world as a whole? (single choice)

QA1b Which others do you consider to be serious problems? (multiple choice)

QA2 And how serious a problem do you think climate change is at this moment? Please use a scale from 1 to 10, with '1' meaning it is "not at all a serious problem (scale 1-10)

QA4 To what extent do you agree or disagree with each of the following statements? - Fighting climate change and using energy more efficiently can boost the economy and jobs in the EU (agreement-disagreement 4-scale)

QA4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU could benefit the EU economically (agreement-disagreement 4-scale)

QA5 Have you personally taken any action to fight climate change over the past six months? (binary)

Eurobarometer 83.4 (2015)

European Commission, Brussels; Directorate General Communication COMM.A.1 ´Strategy, Corporate Communication Actions and Eurobarometer´GESIS Data Archive, Cologne. ZA6595 Data file Version 3.0.0, https://doi.org/10.4232/1.13146

Data file: ZA6595 data file (European Commission 2018).
Questionnaire: Eurobarometer 83.4 Basic Bilingual Questionnaire
Citation: ZA6595 Bibtex

Eurobarometer 87.1 (2017)

European Commission, Brussels; Directorate General Communication, COMM.A.1 ‘Strategic Communication’; European Parliament, Directorate-General for Communication, Public Opinion Monitoring Unit GESIS Data Archive, Cologne. ZA6861 Data file Version 1.2.0, https://doi.org/10.4232/1.12922

Data file: ZA6861 data file.
Questionnaire: Eurobarometer 90.2 Basic Bilingual Questionnaire
Citation: ZA6861 Bibtex

QC1a Which of the following do you consider to be the single most serious problem facing the world as a whole? (single choice)

QC1b Which others do you consider to be serious problems? (multiple choice)

QC2 And how serious a problem do you think climate change is at this moment? Please use a scale from 1 to 10, with '1' meaning it is "not at all a serious problem (scale 1-10)

Qc4 To what extent do you agree or disagree with each of the following statements? - Fighting climate change and using energy more efficiently can boost the economy and jobs in the EU (agreement-disagreement 4-scale)

Qc4 To what extent do you agree or disagree with each of the following statements? - Promoting EU expertise in new clean technologies to countries outside the EU can benefit the EU economically (agreement-disagreement 4-scale)

Qc4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can benefit the EU economically (agreement-disagreement 4-scale)

Qc4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can increase the security of EU energy supplies (agreement-disagreement 4-scale)

Qc4 To what extent do you agree or disagree with each of the following statements? - More public financial support should be given to the transition to clean energies even if it means subsidies to fossil fuels should be reduced. (agreement-disagreement 4-scale)

Qc5 Have you personally taken any action to fight climate change over the past six months? (binary)

Eurobarometer 90.2 (2018)

European Commission, Brussels; Directorate General Communication, COMM.A.3 ‘Media Monitoring and Eurobarometer’ GESIS Data Archive, Cologne. ZA7488 Data file Version 1.0.0, https://doi.org/10.4232/1.13289

Data file: ZA7488 data file (European Commission 2019a)
Questionnaire: Eurobarometer 90.2 Basic Bilingual Questionnaire
Citation: ZA7488 Bibtex

QB5 To what extent do you agree or disagree with each of the following statements? - Fighting climate change and using energy more efficiently can boost the economy and jobs in the EU (agreement-disagreement 4-scale)

QB5 To what extent do you agree or disagree with each of the following statements? - Promoting EU expertise in new clean technologies to countries outside the EU can benefit the EU economically (agreement-disagreement 4-scale)

QB5 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can benefit the EU economically (agreement-disagreement 4-scale)

QB5 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can increase the security of EU energy supplies (agreement-disagreement 4-scale)

QB5 To what extent do you agree or disagree with each of the following statements? - More public financial support should be given to the transition to clean energies even if it means subsidies to fossil fuels should be reduced. (agreement-disagreement 4-scale)

Eurobarometer 91.3 (2019)

European Commission, Brussels; Directorate General Communication, COMM.A.3 ‘Media Monitoring and Eurobarometer’ GESIS Data Archive, Cologne. ZA7572 Data file Version 1.0.0, https://doi.org/10.4232/1.13372

Data file: ZA7572 data file (European Commission 2019b).
Questionnaire: Eurobarometer 91.3 Basic Bilingual Questionnaire
Citation: ZA7572 Bibtex

QB4 To what extent do you agree or disagree with each of the following statements? - Taking action on climate change will lead to innovation that will make EU companies more competitive (N) (agreement-disagreement 4-scale)

QB4 To what extent do you agree or disagree with each of the following statements? - Promoting EU expertise in new clean technologies to countries outside the EU can benefit the EU economically (agreement-disagreement 4-scale)

QB4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can benefit the EU economically (agreement-disagreement 4-scale)

QB4 To what extent do you agree or disagree with each of the following statements? - Adapting to the adverse impacts of climate change can have positive outcomes for citizens in the EU (agreement-disagreement 4-scale)

QB5 Have you personally taken any action to fight climate change over the past six months? (binary)

References

European Commission, Brussels. 2017. “Eurobarometer 80.2 (2013).” GESIS Data Archive, Cologne. ZA5877 Data file Version 2.0.0, https://doi.org/10.4232/1.12792. https://doi.org/10.4232/1.12792.

———. 2018. “Eurobarometer 83.4 (2015).” GESIS Data Archive, Cologne. ZA6595 Data file Version 3.0.0, https://doi.org/10.4232/1.13146. https://doi.org/10.4232/1.13146.

———. 2019a. “Eurobarometer 90.2 (2018).” GESIS Data Archive, Cologne. ZA7488 Data file Version 1.0.0, https://doi.org/10.4232/1.13289. https://doi.org/10.4232/1.13289.

———. 2019b. “Eurobarometer 91.3 (2019).” GESIS Data Archive, Cologne. ZA7572 Data file Version 1.0.0, https://doi.org/10.4232/1.13372. https://doi.org/10.4232/1.13372.

Reproducible Survey Harmonization: retroharmonize Is Released

Mon, 21 Sep 2020 11:31:39 +0000

Our original intention was to make surveying more accessible for music and creative industry partners, by relying more on already existing survey data, and better designing complementary, smaller surveys, becasue surveying, opinion polling is becoming increasingly expensive in the develop world. People are less and less likely to sit down for an interview in their houses. We have tried to harmonize our custom surveys, particuarly with Kantar in Hungary and Focus in Slovakia with exisiting EU projects. But we ended up making a part of international survey harmonization across countries and throughout years easier to automate.

Surveys are like sensors for natural sciences and industrial production. They are essential for almost any social and economic statistical indicator, for calculating the inflation, parts of the GDP, participation in education programs. Making surveys easier to harmonize and exploit more already existing survey data can bring down research cost, and can increase research value at the same time. (See our earlier blog post Increase The Value Of Market Research With Open Data And Survey Harmonization.)

So, if you are an R user, you can use install.packages(“retroharmonize”) to get the released 0.1.13 version and make tutorials with real Eurobarometer or Afrobarometer microdata. With devtools::install_github("antaldaniel/retroharmonize") you can already install the current development version 0.1.14, which handles perl-like regex, which will be necessary for our next tutorial in the making for Arab Barometer.

Related:

retroharmonize R package for survey harmonization

Tue, 25 Aug 2020 00:00:00 +0000

Retrospective data harmonization

The aim of retroharmonize is to provide tools for reproducible retrospective (ex-post) harmonization of datasets that contain variables measuring the same concepts but coded in different ways. Ex-post data harmonization enables better use of existing data and creates new research opportunities. For example, harmonizing data from different countries enables cross-national comparisons, while merging data from different time points makes it possible to track changes over time.

Retrospective data harmonization is associated with challenges including conceptual issues with establishing equivalence and comparability, practical complications of having to standardize the naming and coding of variables, technical difficulties with merging data stored in different formats, and the need to document a large number of data transformations. The retroharmonize package assists with the latter three components, freeing up the capacity of researchers to focus on the first.

Specifically, the retroharmonize package proposes a reproducible workflow, including a new class for storing data together with the harmonized and original metadata, as well as functions for importing data from different formats, harmonizing data and metadata, documenting the harmonization process, and converting between data types. See here for an overview of the functionalities.

The new labelled_spss_survey() class is an extension of haven’s labelled_spss class. It not only preserves variable and value labels and the user-defined missing range, but also gives an identifier, for example, the filename or the wave number, to the vector. Additionally, it enables the preservation – as metadata attributes – of the original variable names, labels, and value codes and labels, from the source data, in addition to the harmonized variable names, labels, and value codes and labels. This way, the harmonized data also contain the pre-harmonization record. The stored original metadata can be used for validation and documentation purposes.

The vignette Working With The labelled_spss_survey Class provides more information about the labelled_spss_survey() class.

In Harmonize Value Labels we discuss the characteristics of the labelled_spss_survey() class and demonstrates the problems that using this class solves.

We also provide three extensive case studies illustrating how the retroharmonize package can be used for ex-post harmonization of data from cross-national surveys:

The creators of retroharmonize are not affiliated with either Afrobarometer, Arab Barometer, Eurobarometer, or the organizations that designs, produces or archives their surveys.

We started building an experimental APIs data is running retroharmonize regularly and improving known statistical data sources. See: Digital Music Observatory, Green Deal Data Observatory, Economy Data Observatory.

Citing the data sources

Our package has been tested on three harmonized survey’s microdata. Because retroharmonize is not affiliated with any of these data sources, to replicate our tutorials or work with the data, you have download the data files from these sources, and you have to cite those sources in your work.

Afrobarometer data: Cite Afrobarometer Arab Barometer data: cite Arab Barometer. Eurobarometer data: The Eurobarometer data Eurobarometer raw data and related documentation (questionnaires, codebooks, etc.) are made available by GESIS, ICPSR and through the Social Science Data Archive networks. You should cite your source, in our examples, we rely on the GESIS data files.

Citing the retroharmonize R package

For main developer and contributors, see the package homepage.

This work can be freely used, modified and distributed under the GPL-3 license:

citation("retroharmonize")
#> 
#> To cite package 'retroharmonize' in publications use:
#> 
#> Daniel Antal (2021). retroharmonize: Ex Post Survey Data
#> Harmonization. R package version 0.1.17.
#> https://retroharmonize.dataobservatory.eu/
#> 
#> A BibTeX entry for LaTeX users is
#> 
#> @Manual{,
#> title: {retroharmonize: Ex Post Survey Data Harmonization},
#> author: {Daniel Antal},
#> year: {2021},
#> doi: {10.5281/zenodo.5006056},
#> note: {R package version 0.1.17},
#> url: {https://retroharmonize.dataobservatory.eu/},
#> }

Contact

For contact information, contributors, see the package homepage.

Code of Conduct

Please note that the retroharmonize project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.

CEEMID

Wed, 27 Feb 2019 00:00:00 +0000

CEEMID was multi-country project that was a predecessor of Reprex’s Digital Music Observatory. It transferred thousands of indicators to the Digital Music Observatory and offered it to the future European Music Observatory.

The CEEMID project (2014-2020) formed the basis of our first data observatory — the challenges of a 12-country data collection and sharing project forced our team to find novel software, data governance and data processing solution that fitted a very fragmented, multi-language music industry with many internal and external challenges. We showed that we can provide reach, data-driven evidence in seemingly data poor countries. We can find alternative data sources when there are no data vendors, no official government statistics are present, or when important data assets cannot be used due to business confidentiality clauses.

The Central European Music Industry Report 2020 concluded the CEEMID project showing that building meaningful statistical indicators for the live performance, recording and publishing sides of the music industry is possible even in seemingly data poor emerging and future markets. Our Report was presented as a best practice by the European Commission and the Geothe Institute in 2020.

In 2019 we offered CEEMID as one possible alternative to building the European Music Observatory. In our experience and understanding, the music industry has many failed international data projects because of the inherent conflicts of interests among big and large countries, authors and producers, producers, and performers, and in 2020 we launched our Digital Music Observatory.

From an early stage, there has been an interest in our solutions from newer and newer industries. During the validation of our observatory’s product/market fit, we realized that we gave a novel solution to a problem that is not at all unique to the music industry. The fragmentation of data assets among the institutional boundaries of small enterprises and NGOs, the fragmentation of research budgets that disallow comprehensive data collection problems, the reliance on questionable quality and hard-to-integrate public and third-party data is present in almost all business and policy domains.

The Case Study in Belgium brings together open data in a novel way from satellite, hydrological, opinion polling and tax administration data to show the geographical overlap of catastrophic drought and flood risk with the local political awareness and financial capacity to manage it. Our approach to music data turned out to be applicable in many situations when the data and the research capacities are fragmented.

We developed reliable software tools and know-how bring up reusable open government and open science data, which are available upon request in developed countries like raw diamonds, without the polishing of data scientists, documentation, or modern databases. We learned how to build a decentralized approach to data sharing and resource pooling using the agile open collaboration method of open-source software development. We became experts in handling legally open data and freedom of information requests on an industrial scale. After the launch of the Digital Music Observatory, we started building three new observatories with various partners in Computational Antitrust, Green Deal (climate change) and the broader Cultural Creative Sectors & Industries.

CEEMID did not only work with public and openly accessible data. We used these novel sources when other data was not available, legally could not be used, or it was prohibitively expensive. But most of the value we created for music rightsholders required the management of highly confidential data. We know that a public data observatory is not ideal for all potential users. We are piloting fully private observatories, and a blend of pubic (and therefore widely trustworthy) and proprietary hybrid observatories based on the experience of the ‘public’ and ‘private’ layers of CEEMID.

Historically CEEMID started out as the Central and Eastern European Music Industry Databases out of necessity following a CISAC Good Governance Seminar for European Societies in 2013, and eventually grew out of an abandoned GESAC project. The adoption of European single market and copyright rules, and the increased activity of competition authority and regulators required a more structured approach to set collective royalty and compensations tariffs in a region that was regarded traditionally as data-poor with lower quantity of industry and government data sources available.

In 2014 three societies, Artisjus, HDS and SOZA realized that need to make further efforts to modernize the way they measure their own economic impact, the economic value of their licenses to remain competitive in advocating the interests vis-à-vis domestic governments, international organizations like CISAC and GESAC and the European Union. They signed a Memorandum of Understanding with their consultant to set up the CEEMID databases and to harmonize their efforts. The adoption of European single market and copyright rules, and the increased activity of competition authority and regulators required a more structured approach to set collective royalty and compensations tariffs in a region that was regarded traditionally as data-poor with lower quantity of industry and government data sources available, but quickly covered the whole European area (Read more about our data coverage and our pan-European geographical coverage.)

We believe that CEEMID could fulfill the functions of the European Music Industry if it would find a sustainable financing that makes access to all our data open. We need to be able to avoid the tragedy of the commons, where only a few industry users contribute to the financing of thousands of indicators that could potentially benefits tens of thousands of stakeholders in the EU.