<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>surveys | Reprex</title><link>https://reprex-next.netlify.app/tag/surveys/</link><atom:link href="https://reprex-next.netlify.app/tag/surveys/index.xml" rel="self" type="application/rss+xml"/><description>surveys</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sat, 06 Mar 2021 00:00:00 +0000</lastBuildDate><image><url>https://reprex-next.netlify.app/media/icon_hub9491570ac57158c0eeecc95c95b13e5_20247_512x512_fill_lanczos_center_3.png</url><title>surveys</title><link>https://reprex-next.netlify.app/tag/surveys/</link></image><item><title>Where Are People More Likely To Treat Climate Change as the Most Serious Global Problem?</title><link>https://reprex-next.netlify.app/post/2021-03-06-individual-join/</link><pubDate>Sat, 06 Mar 2021 00:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-03-06-individual-join/</guid><description>&lt;pre>&lt;code>library(regions)
library(lubridate)
library(dplyr)
if ( dir.exists('data-raw') ) {
data_raw_dir &amp;lt;- &amp;quot;data-raw&amp;quot;
} else {
data_raw_dir &amp;lt;- file.path(&amp;quot;..&amp;quot;, &amp;quot;..&amp;quot;, &amp;quot;data-raw&amp;quot;)
}
&lt;/code>&lt;/pre>
&lt;p>The first results of our longitudinal table &lt;a href="post/2021-03-05-retroharmonize-climate/">were difficult to
map&lt;/a>, because the surveys used
an obsolete regional coding. We will adjust the wrong coding, when
possible, and join the data with the European Environment Agency’s (EEA)
Air Quality e-Reporting (AQ e-Reporting) data on environmental
pollution. We recoded the annual level for every available reporting
stations [&lt;em>not shown here&lt;/em>] and all values are in μg/m3. The period
under observation is 2014-2016. Data file:
&lt;a href="https://www.eea.europa.eu/data-and-maps/data/aqereporting-8" target="_blank" rel="noopener">https://www.eea.europa.eu/data-and-maps/data/aqereporting-8&lt;/a> (European
Environment Agency 2021).&lt;/p>
&lt;h2 id="recoding-the-regions">Recoding the Regions&lt;/h2>
&lt;p>Recoding means that the boundaries are unchanged, but the country
changed the names and codes of regions because there were other boundary
changes which did not affect our observation unit. We explain the
problem and the solution in greater detail in &lt;a href="http://netzero.dataobservatory.eu/post/2021-03-06-regions-climate/" target="_blank" rel="noopener">our
tutorial&lt;/a>
that aggregates the data on regional levels.&lt;/p>
&lt;pre>&lt;code>panel &amp;lt;- readRDS((file.path(data_raw_dir, &amp;quot;climate-panel.rds&amp;quot;)))
climate_data_geocode &amp;lt;- panel %&amp;gt;%
mutate ( year: lubridate::year(date_of_interview)) %&amp;gt;%
recode_nuts()
&lt;/code>&lt;/pre>
&lt;p>Let’s join the air pollution data and join it by corrected geocodes:&lt;/p>
&lt;pre>&lt;code>load(file.path(&amp;quot;data&amp;quot;, &amp;quot;air_pollutants.rda&amp;quot;)) ## good practice to use system-independent file.path
climate_awareness_air &amp;lt;- climate_data_geocode %&amp;gt;%
rename ( region_nuts_codes : .data$code_2016) %&amp;gt;%
left_join ( air_pollutants, by: &amp;quot;region_nuts_codes&amp;quot; ) %&amp;gt;%
select ( -all_of(c(&amp;quot;w1&amp;quot;, &amp;quot;wex&amp;quot;, &amp;quot;date_of_interview&amp;quot;,
&amp;quot;typology&amp;quot;, &amp;quot;typology_change&amp;quot;, &amp;quot;geo&amp;quot;, &amp;quot;region&amp;quot;))) %&amp;gt;%
mutate (
# remove special labels and create NA_numeric_
age_education: retroharmonize::as_numeric(age_education)) %&amp;gt;%
mutate_if ( is.character, as.factor) %&amp;gt;%
mutate (
# we only have responses from 4 years, and this should be treated as a categorical variable
year: as.factor(year)
) %&amp;gt;%
filter ( complete.cases(.) )
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>climate_awareness_air&lt;/code> data frame contains the answers of 75086
individual respondents. 17.07% thought that climate change was the most
serious world problem and 33.6% mentioned climate change as one of the
three most important global problems.&lt;/p>
&lt;pre>&lt;code>summary ( climate_awareness_air )
## rowid serious_world_problems_first
## ZA5877_v2-0-0_1 : 1 Min. :0.0000
## ZA5877_v2-0-0_10 : 1 1st Qu.:0.0000
## ZA5877_v2-0-0_100 : 1 Median :0.0000
## ZA5877_v2-0-0_1000 : 1 Mean :0.1707
## ZA5877_v2-0-0_10000: 1 3rd Qu.:0.0000
## ZA5877_v2-0-0_10001: 1 Max. :1.0000
## (Other) :75080
## serious_world_problems_climate_change isocntry
## Min. :0.000 BE : 3028
## 1st Qu.:0.000 CZ : 3023
## Median :0.000 NL : 3019
## Mean :0.336 SK : 3000
## 3rd Qu.:1.000 SE : 2980
## Max. :1.000 DE-W : 2978
## (Other):57058
## marital_status age_education
## (Re-)Married: without children :13242 18 :15485
## (Re-)Married: children this marriage :12696 19 : 7728
## Single: without children : 7650 16 : 5840
## (Re-)Married: w children of this marriage: 6520 still studying: 5098
## (Re-)Married: living without children : 6225 17 : 5092
## Single: living without children : 4102 15 : 4528
## (Other) :24651 (Other) :31315
## age_exact occupation_of_respondent
## Min. :15.0 Retired, unable to work :22911
## 1st Qu.:36.0 Skilled manual worker : 6774
## Median :51.0 Employed position, at desk : 6716
## Mean :50.1 Employed position, service job: 5624
## 3rd Qu.:65.0 Middle management, etc. : 5252
## Max. :99.0 Student : 5098
## (Other) :22711
## occupation_of_respondent_recoded
## Employed (10-18 in d15a) :32763
## Not working (1-4 in d15a) :37125
## Self-employed (5-9 in d15a): 5198
##
##
##
##
## respondent_occupation_scale_c_14
## Retired (4 in d15a) :22911
## Manual workers (15 to 18 in d15a) :15269
## Other white collars (13 or 14 in d15a): 9203
## Managers (10 to 12 in d15a) : 8291
## Self-employed (5 to 9 in d15a) : 5198
## Students (2 in d15a) : 5098
## (Other) : 9116
## type_of_community is_student no_education
## DK : 34 Min. :0.0000 Min. :0.000000
## Large town :20939 1st Qu.:0.0000 1st Qu.:0.000000
## Rural area or village :24686 Median :0.0000 Median :0.000000
## Small or middle sized town: 9850 Mean :0.0679 Mean :0.008151
## Small/middle town :19577 3rd Qu.:0.0000 3rd Qu.:0.000000
## Max. :1.0000 Max. :1.000000
##
## education year region_nuts_codes country_code
## Min. :14.00 2013:25103 LU : 1432 DE : 4531
## 1st Qu.:17.00 2015: 0 MT : 1398 GB : 3538
## Median :18.00 2017:25053 CY : 1192 BE : 3028
## Mean :19.61 2019:24930 SK02 : 1053 CZ : 3023
## 3rd Qu.:22.00 EL30 : 974 NL : 3019
## Max. :30.00 EE : 973 SK : 3000
## (Other):68064 (Other):54947
## pm2_5 pm10 o3 BaP
## Min. : 2.109 Min. : 5.883 Min. : 66.37 Min. :0.0102
## 1st Qu.: 9.374 1st Qu.: 28.326 1st Qu.: 90.89 1st Qu.:0.1779
## Median :11.866 Median : 33.673 Median :102.81 Median :0.4105
## Mean :12.954 Mean : 38.637 Mean :101.49 Mean :0.8759
## 3rd Qu.:15.890 3rd Qu.: 49.488 3rd Qu.:110.73 3rd Qu.:1.0692
## Max. :41.293 Max. :123.239 Max. :141.04 Max. :7.8050
##
## so2 ap_pc1 ap_pc2 ap_pc3
## Min. : 0.0000 Min. :-4.6669 Min. :-2.21851 Min. :-2.1007
## 1st Qu.: 0.0000 1st Qu.:-0.4624 1st Qu.:-0.49130 1st Qu.:-0.5695
## Median : 0.0000 Median : 0.4263 Median : 0.02902 Median :-0.1113
## Mean : 0.1032 Mean : 0.1031 Mean : 0.04166 Mean :-0.1746
## 3rd Qu.: 0.0000 3rd Qu.: 0.9748 3rd Qu.: 0.57416 3rd Qu.: 0.3309
## Max. :42.5325 Max. : 2.0344 Max. : 3.25841 Max. : 4.1615
##
## ap_pc4 ap_pc5
## Min. :-1.7387 Min. :-2.75079
## 1st Qu.:-0.1669 1st Qu.:-0.18748
## Median : 0.0371 Median : 0.01811
## Mean : 0.1154 Mean : 0.06797
## 3rd Qu.: 0.3050 3rd Qu.: 0.34937
## Max. : 3.2476 Max. : 1.42816
##
&lt;/code>&lt;/pre>
&lt;p>Let’s see a simple CART tree! We remove the regional codes, because
there are very serious differences among regional climate awareness.
These differences, together with education level, and the year we are
talking about, are the most important predictors of thinking about
climate change as the most important global problem in Europe.&lt;/p>
&lt;pre>&lt;code># Classification Tree with rpart
library(rpart)
# grow tree
fit &amp;lt;- rpart(as.factor(serious_world_problems_first) ~ .,
method=&amp;quot;class&amp;quot;, data=climate_awareness_air %&amp;gt;%
select ( - all_of(c(&amp;quot;rowid&amp;quot;, &amp;quot;region_nuts_codes&amp;quot;))),
control: rpart.control(cp: 0.005))
printcp(fit) # display the results
##
## Classification tree:
## rpart(formula: as.factor(serious_world_problems_first) ~ .,
## data: climate_awareness_air %&amp;gt;% select(-all_of(c(&amp;quot;rowid&amp;quot;,
## &amp;quot;region_nuts_codes&amp;quot;))), method: &amp;quot;class&amp;quot;, control: rpart.control(cp: 0.005))
##
## Variables actually used in tree construction:
## [1] age_education isocntry
## [3] serious_world_problems_climate_change year
##
## Root node error: 12817/75086: 0.1707
##
## n= 75086
##
## CP nsplit rel error xerror xstd
## 1 0.0240566 0 1.00000 1.00000 0.0080438
## 2 0.0082703 3 0.92783 0.92783 0.0078055
## 3 0.0050000 5 0.91129 0.91425 0.0077588
plotcp(fit) # visualize cross-validation results
&lt;/code>&lt;/pre>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="&amp;amp;ldquo;Visualize cross-validation results&amp;amp;rdquo;" srcset="
/post/2021-03-06-individual-join/rpart-1_hu9f1f775a32eec3a67a573c0d2df50ef4_4271_8ce48ac0f7ba6b1d3752385b96368cc3.webp 400w,
/post/2021-03-06-individual-join/rpart-1_hu9f1f775a32eec3a67a573c0d2df50ef4_4271_b20e6dca7fcadd4576da216956498a35.webp 760w,
/post/2021-03-06-individual-join/rpart-1_hu9f1f775a32eec3a67a573c0d2df50ef4_4271_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/post/2021-03-06-individual-join/rpart-1_hu9f1f775a32eec3a67a573c0d2df50ef4_4271_8ce48ac0f7ba6b1d3752385b96368cc3.webp"
width="672"
height="480"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;pre>&lt;code>summary(fit) # detailed summary of splits
## Call:
## rpart(formula: as.factor(serious_world_problems_first) ~ .,
## data: climate_awareness_air %&amp;gt;% select(-all_of(c(&amp;quot;rowid&amp;quot;,
## &amp;quot;region_nuts_codes&amp;quot;))), method: &amp;quot;class&amp;quot;, control: rpart.control(cp: 0.005))
## n= 75086
##
## CP nsplit rel error xerror xstd
## 1 0.024056592 0 1.0000000 1.0000000 0.008043837
## 2 0.008270266 3 0.9278302 0.9278302 0.007805478
## 3 0.005000000 5 0.9112897 0.9142545 0.007758824
##
## Variable importance
## serious_world_problems_climate_change isocntry
## 31 26
## country_code BaP
## 20 8
## pm2_5 ap_pc1
## 4 3
## age_education pm10
## 2 2
## education ap_pc2
## 2 1
## year
## 1
##
## Node number 1: 75086 observations, complexity param=0.02405659
## predicted class=0 expected loss=0.1706976 P(node): 1
## class counts: 62269 12817
## probabilities: 0.829 0.171
## left son=2 (25229 obs) right son=3 (49857 obs)
## Primary splits:
## serious_world_problems_climate_change &amp;lt; 0.5 to the right, improve=2214.2040, (0 missing)
## isocntry splits as RRLLLRRRLLRLRLLLLLLLLLLRRLLLRLL, improve= 728.0160, (0 missing)
## country_code splits as RRLLLRRLLRLLLLLLLLLLRRLLLRLL, improve= 673.3656, (0 missing)
## BaP &amp;lt; 0.4300347 to the right, improve= 310.6229, (0 missing)
## pm2_5 &amp;lt; 13.38264 to the right, improve= 296.4013, (0 missing)
## Surrogate splits:
## age_education splits as ----RRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRRRRRR-RRRRRL-RRR-RRRRRRRRR--RRRLLR--R-R, agree=0.664, adj=0, (0 split)
## pm10 &amp;lt; 7.491315 to the left, agree=0.664, adj=0, (0 split)
##
## Node number 2: 25229 observations
## predicted class=0 expected loss=0 P(node): 0.3360014
## class counts: 25229 0
## probabilities: 1.000 0.000
##
## Node number 3: 49857 observations, complexity param=0.02405659
## predicted class=0 expected loss=0.2570752 P(node): 0.6639986
## class counts: 37040 12817
## probabilities: 0.743 0.257
## left son=6 (34631 obs) right son=7 (15226 obs)
## Primary splits:
## isocntry splits as RRLLLRRRLLRLRLLLLLLLLLLRRLLLRLL, improve=1454.9460, (0 missing)
## country_code splits as RRLLLRRLLRLLLLLLLLLLRRLLLRLL, improve=1359.7210, (0 missing)
## BaP &amp;lt; 0.4300347 to the right, improve= 629.8844, (0 missing)
## pm2_5 &amp;lt; 13.38264 to the right, improve= 555.7484, (0 missing)
## ap_pc1 &amp;lt; -0.005459537 to the left, improve= 533.3579, (0 missing)
## Surrogate splits:
## country_code splits as RRLLLRRLLRLLLLLLLLLLRRLLLRLL, agree=0.987, adj=0.957, (0 split)
## BaP &amp;lt; 0.1749425 to the right, agree=0.775, adj=0.264, (0 split)
## pm2_5 &amp;lt; 5.206993 to the right, agree=0.737, adj=0.140, (0 split)
## ap_pc1 &amp;lt; 1.405527 to the left, agree=0.733, adj=0.126, (0 split)
## pm10 &amp;lt; 25.31211 to the right, agree=0.718, adj=0.076, (0 split)
##
## Node number 6: 34631 observations
## predicted class=0 expected loss=0.1769802 P(node): 0.4612178
## class counts: 28502 6129
## probabilities: 0.823 0.177
##
## Node number 7: 15226 observations, complexity param=0.02405659
## predicted class=0 expected loss=0.4392487 P(node): 0.2027808
## class counts: 8538 6688
## probabilities: 0.561 0.439
## left son=14 (11607 obs) right son=15 (3619 obs)
## Primary splits:
## isocntry splits as LL---LLR--L-L----------LL---R--, improve=337.5462, (0 missing)
## country_code splits as LL---LR--L-L--------LL---R--, improve=337.5462, (0 missing)
## age_education splits as ----LLLLLL-LLLRRRRRRR-RRRRRRRRRL-RRRRRRLLRR-RRRRLLRLRL-RRLRRR-RRR-LLLLRRR-----LR-----L-R, improve=294.0807, (0 missing)
## education &amp;lt; 22.5 to the left, improve=262.3747, (0 missing)
## BaP &amp;lt; 0.053328 to the right, improve=232.7043, (0 missing)
## Surrogate splits:
## BaP &amp;lt; 0.053328 to the right, agree=0.878, adj=0.485, (0 split)
## pm2_5 &amp;lt; 4.810361 to the right, agree=0.827, adj=0.271, (0 split)
## ap_pc2 &amp;lt; 0.8746175 to the left, agree=0.792, adj=0.124, (0 split)
## so2 &amp;lt; 0.3302972 to the left, agree=0.781, adj=0.078, (0 split)
## age_education splits as ----LLLLLL-LLLLLLLRLR-LRRLRRRRRR-RRRRLLLLLR-LRLRLLRRLL-LLRLLR-LLR-RRLLLLL-----RR-----R-L, agree=0.779, adj=0.071, (0 split)
##
## Node number 14: 11607 observations, complexity param=0.008270266
## predicted class=0 expected loss=0.3804601 P(node): 0.1545827
## class counts: 7191 4416
## probabilities: 0.620 0.380
## left son=28 (7462 obs) right son=29 (4145 obs)
## Primary splits:
## age_education splits as ----LLLLLL-LRRRRRRRRR-RRLRRLRRLL-RRRRLRLLRR-RLRLLLRLRL-RR-RR--RRL-L-LLRRR------------L-R, improve=123.71070, (0 missing)
## year splits as R-LR, improve=107.79460, (0 missing)
## education &amp;lt; 20.5 to the left, improve= 90.28724, (0 missing)
## occupation_of_respondent splits as LRRLRRRRRLRLLLRLLL, improve= 84.62865, (0 missing)
## respondent_occupation_scale_c_14 splits as LRLLLRRL, improve= 68.88653, (0 missing)
## Surrogate splits:
## education &amp;lt; 20.5 to the left, agree=0.950, adj=0.861, (0 split)
## occupation_of_respondent splits as LLLLRLLRRLRLLLRLLL, agree=0.738, adj=0.267, (0 split)
## respondent_occupation_scale_c_14 splits as LRLLLLRL, agree=0.733, adj=0.251, (0 split)
## is_student &amp;lt; 0.5 to the left, agree=0.709, adj=0.186, (0 split)
## age_exact &amp;lt; 23.5 to the right, agree=0.676, adj=0.094, (0 split)
##
## Node number 15: 3619 observations
## predicted class=1 expected loss=0.3722023 P(node): 0.04819807
## class counts: 1347 2272
## probabilities: 0.372 0.628
##
## Node number 28: 7462 observations
## predicted class=0 expected loss=0.326052 P(node): 0.09937938
## class counts: 5029 2433
## probabilities: 0.674 0.326
##
## Node number 29: 4145 observations, complexity param=0.008270266
## predicted class=0 expected loss=0.4784077 P(node): 0.05520337
## class counts: 2162 1983
## probabilities: 0.522 0.478
## left son=58 (2573 obs) right son=59 (1572 obs)
## Primary splits:
## year splits as L-LR, improve=40.13885, (0 missing)
## occupation_of_respondent splits as LRLLRRRRRLRLLLRLLL, improve=18.33254, (0 missing)
## marital_status splits as LRRRLRRRLRRLRLLRRRRRRLRLRLLRR, improve=17.86888, (0 missing)
## type_of_community splits as LRLRL, improve=17.55254, (0 missing)
## age_education splits as ------------LLRRRRRRR-RR-RL-RR---LRRR-R--LR-R-R---R-R--RR-RR--RR------RRR--------------R, improve=14.66121, (0 missing)
## Surrogate splits:
## type_of_community splits as LLLRL, agree=0.777, adj=0.412, (0 split)
## marital_status splits as RRLLLLLRLLLLLLLRRRLLLLLLRLRLL, agree=0.680, adj=0.155, (0 split)
## isocntry splits as LL---LL---L-R----------LL------, agree=0.669, adj=0.127, (0 split)
## country_code splits as LL---L---L-R--------LL------, agree=0.669, adj=0.127, (0 split)
## o3 &amp;lt; 83.06345 to the right, agree=0.650, adj=0.076, (0 split)
##
## Node number 58: 2573 observations
## predicted class=0 expected loss=0.4240187 P(node): 0.03426737
## class counts: 1482 1091
## probabilities: 0.576 0.424
##
## Node number 59: 1572 observations
## predicted class=1 expected loss=0.43257 P(node): 0.02093599
## class counts: 680 892
## probabilities: 0.433 0.567
# plot tree
plot(fit, uniform=TRUE,
main=&amp;quot;Classification Tree: Climate Change Is The Most Serious Threat&amp;quot;)
text(fit, use.n=TRUE, all=TRUE, cex=.8)
## Warning in labels.rpart(x, minlength: minlength): more than 52 levels in a
## predicting factor, truncated for printout
&lt;/code>&lt;/pre>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="&amp;amp;ldquo;predicting factor, truncated for printout&amp;amp;rdquo;" srcset="
/post/2021-03-06-individual-join/rpart-2_hu8765078af843fd2a25e4b77d7cba4bfb_9882_0bdd94d7f6c1efcc2575c1adeb6917c8.webp 400w,
/post/2021-03-06-individual-join/rpart-2_hu8765078af843fd2a25e4b77d7cba4bfb_9882_daf3b553e16b54a4b23a242bc9ef1e6b.webp 760w,
/post/2021-03-06-individual-join/rpart-2_hu8765078af843fd2a25e4b77d7cba4bfb_9882_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/post/2021-03-06-individual-join/rpart-2_hu8765078af843fd2a25e4b77d7cba4bfb_9882_0bdd94d7f6c1efcc2575c1adeb6917c8.webp"
width="672"
height="480"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;pre>&lt;code>saveRDS ( climate_awareness_air , file.path(tempdir(), &amp;quot;climate_panel_recoded.rds&amp;quot;), version: 2)
# not evaluated
saveRDS( climate_awareness_air, file: file.path(&amp;quot;data-raw&amp;quot;, &amp;quot;climate-panel_recoded.rds&amp;quot;))
&lt;/code>&lt;/pre></description></item><item><title>What is Retrospective Survey Harmonization?</title><link>https://reprex-next.netlify.app/post/2021-03-04_retroharmonize_intro/</link><pubDate>Thu, 04 Mar 2021 00:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-03-04_retroharmonize_intro/</guid><description>&lt;h2 id="reproducible-ex-post-harmonization-of-survey-microdata">Reproducible ex post harmonization of survey microdata&lt;/h2>
&lt;p>Retrospective survey harmonization allows the comparison of opinion poll
data conducted in different countries or time. In this example we are
working with data from surveys that were ex ante harmonized to a certain
degree – in our tutorials we are choosing questions that were asked in
the same way in many natural languages. For example, you can compare
what percentage of the European people in various countries, provinces
and regions thought climate change was a serious world problem back in
2013, 2015, 2017 and 2019.&lt;/p>
&lt;p>We developed the
&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">retroharmonize&lt;/a> R package
to help this process. We have tested the package with about 80
Eurobarometer, 5 Afrobarometer survey files extensively, and a bit with
Arabbarometer files. This allows the comparison of various survey
answers in about 70 countries. This policy-oriented survey programs were
designed to be harmonized to a certain degree, but their ex post
harmonization is still necessary, challenging and errorprone.
Retrospective harmonization includes harmonization of the different
coding used for questions and answer options, post-stratification
weights, and using different file formats.&lt;/p>
&lt;p>&lt;a href="https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm" target="_blank" rel="noopener">Eurobarometer&lt;/a>,
&lt;a href="https://www.afrobarometer.org/" target="_blank" rel="noopener">Afrobaromer&lt;/a>, &lt;a href="https://www.arabbarometer.org/" target="_blank" rel="noopener">Arab
Barometer&lt;/a> and
&lt;a href="https://www.latinobarometro.org/lat.jsp" target="_blank" rel="noopener">Latinobarómetro&lt;/a> make survey
files that are harmonized across countries available for research with
various terms. Our
&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">retroharmonize&lt;/a> is not
affiliated with them, and to run our examples, you must visit their
websites, carefully read their terms, agree to them, and download their
data yourself. What we add as a value is that we help to connect their
files across time (from different years) or across these programs.&lt;/p>
&lt;p>The survey programs mentioned above publish their data in the
proprietary SPSS format. This file format can be imported and translated
to R objects with the haven package; however, we needed to re-design
&lt;a href="https://haven.tidyverse.org/" target="_blank" rel="noopener">haven’s&lt;/a>
&lt;a href="https://haven.tidyverse.org/reference/labelled_spss.html" target="_blank" rel="noopener">labelled_spss&lt;/a>
class to maintain far more metadata, which, in turn, a modification of
the &lt;a href="">labelled&lt;/a> class. The haven package was designed and tested with
data stored in individual SPSS files.&lt;/p>
&lt;p>The author of labelled, Joseph Larmarange describes two main approaches
to work with labelled data, such as SPSS’s method to store categorical
data in the &lt;a href="http://larmarange.github.io/labelled/articles/intro_labelled.html" target="_blank" rel="noopener">Introduction to
labelled&lt;/a>.&lt;/p>
&lt;figure id="figure-two-main-approaches-of-labelled-data-conversion">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="img/larmarange_approaches_to_labelled.png" alt="Two main approaches of labelled data conversion." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Two main approaches of labelled data conversion.
&lt;/figcaption>&lt;/figure>
&lt;p>Our approach is a further extension of &lt;strong>Approach B&lt;/strong>. Survey
harmonization in our case always means the joining data from several
SPSS files, which requires a consistent coding among several data
sources. This means that data cleaning and recoding must take place
before conversion to factors, character or numeric vectors. This is
particularly important with factor data (and their simple character
conversions) and numeric data that occasionally contains labels, for
example, to describe the reason why certain data is missing. Our
tutorial vignette
&lt;a href="https://retroharmonize.dataobservatory.eu/articles/labelled_spss_survey.html" target="_blank" rel="noopener">labelled_spss_survey&lt;/a>
gives you more information about this.&lt;/p>
&lt;p>In the next series of tutorials, we will deal with an array of problems.
These are not for the faint heart – you need to have a solid
intermediate level of R to follow.&lt;/p>
&lt;h2 id="tidy-joined-survey-data">Tidy, joined survey data&lt;/h2>
&lt;ul>
&lt;li>The original files identifiers may not be unique, we have to create
new, truly unique identifiers. Weighting may not be straightforward.&lt;/li>
&lt;li>Neither the number of observations or the number of variables (which
represents the survey questions and their translation to coded data)
is the same. Certain data may be only present in one survey and not
the other. This means that you will likely to run loops on lists and
not data.frames, but eventually you must carefully join them.&lt;/li>
&lt;/ul>
&lt;h2 id="class-conversion">Class conversion&lt;/h2>
&lt;ul>
&lt;li>Similar questions may be imported from a non-native R format, in our
case, from an SPSS files, in an inconsistent manner. SPSS’s variable
formats cannot be translated unambiguously to R classes.
&lt;code>retroharmonize&lt;/code> introduced a new S3 class system that handles this
problem, but eventually you will have to choose if you want to see a
numeric or character coding of each categorical variable.&lt;/li>
&lt;li>The harmonized surveys, with harmonized variable names and
harmonized value labels, must be brought to consistent R
representations (most statistical functions will only work on
numeric, factor or character data) and carefully joined into a
single data table for analysis.&lt;/li>
&lt;/ul>
&lt;h2 id="harmonization-of-variables-and-variable-labels">Harmonization of variables and variable labels&lt;/h2>
&lt;ul>
&lt;li>Same variables may come with dissimilar variable names and variable
labels. It may be a challenge to match age with age. We need to
harmonize the names of variables.&lt;/li>
&lt;li>The harmonized variables may have different labeling. One may call
refused answers as &lt;code>declined&lt;/code> and the other &lt;code>refusal&lt;/code>. On a simple
choice, climate change may be ‘Climate change’ or
&lt;code>Problem: Climate change&lt;/code>. Binary choices may have survey-specific
coding conventions. Value labels must be harmonized. There are good
tools to do this in a single file - but we have to work with several
of them.&lt;/li>
&lt;/ul>
&lt;h2 id="missing-value-harmonization">Missing value harmonization&lt;/h2>
&lt;ul>
&lt;li>There are likely to be various types of &lt;code>missing values&lt;/code>. Working
with missing values is probably where most human judgment is needed.
Why are some answers missing: was the question not asked in some
questionnaires? Is there a coding error? Did the respondent refuse
the question, or sad that she did not have an answer?
&lt;code>retroharmonize&lt;/code> has a special labeled vector type that retains this
information from the raw data, if it is present, but you must make
the judgment yourself – in R, eventually you will either create a
missing category, or use &lt;code>NA_character_&lt;/code> or &lt;code>NA_real_&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>That’s a lot to put on your plate.&lt;/p>
&lt;p>It is unlikely that you will be able to work with completely unfamiliar
survey programs if you do not have a strong intermediate level of R. Our
package comes with tutorials for
&lt;a href="https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html" target="_blank" rel="noopener">Eurobarometer&lt;/a>,
&lt;a href="https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html" target="_blank" rel="noopener">Afrobarometer&lt;/a>
and our development version already covers Arab Barometer, highlighting
some peculiar issues with these survey programs, that we hope to give a
head start for less experienced R users.&lt;/p></description></item><item><title>Eurobarometer Surveys Used In Our Project</title><link>https://reprex-next.netlify.app/post/2021-03-04-eurobarometer_data/</link><pubDate>Wed, 03 Mar 2021 00:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-03-04-eurobarometer_data/</guid><description>&lt;p>In our &lt;a href="https://reprex-next.netlify.app/post/2021-03-04_retroharmonize_intro/">tutorial
series&lt;/a>,
we are going to harmonize the following questionnaire items from five
Eurobarometer harmonized survey files. The Eurobarometer survey files
are harmonized across countries, but they are only partially harmonized
in time.&lt;/p>
&lt;p>All data must be downloaded from the
&lt;a href="https://www.gesis.org/en/home" target="_blank" rel="noopener">GESIS&lt;/a> Data Archive in Cologne. We are
not affiliated with GESIS and you must read and accept their terms to
use the data.&lt;/p>
&lt;h2 id="eurobarometer-802-2013">Eurobarometer 80.2 (2013)&lt;/h2>
&lt;p>GESIS Data Archive, Cologne. ZA5877 Data file Version 2.0.0,
&lt;a href="https://doi.org/10.4232/1.12792" target="_blank" rel="noopener">https://doi.org/10.4232/1.12792&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Data file: &lt;a href="https://search.gesis.org/research_data/ZA5877" target="_blank" rel="noopener">ZA6595&lt;/a>
data file (European Commission 2017).&lt;/li>
&lt;li>Questionnaire: &lt;a href="https://dbk.gesis.org/dbksearch/download.asp?id=54036" target="_blank" rel="noopener">Eurobarometer 83.4 Basic Bilingual
Questionnaire&lt;/a>&lt;/li>
&lt;li>Citation: &lt;a href="https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA5877&amp;amp;lang=en" target="_blank" rel="noopener">ZA6595
Bibtex&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;code>QA1a Which of the following do you consider to be the single most serious problem facing the world as a whole?&lt;/code>
(single choice)&lt;/p>
&lt;p>&lt;code>QA1b Which others do you consider to be serious problems?&lt;/code> (multiple
choice)&lt;/p>
&lt;p>&lt;code>QA2 And how serious a problem do you think climate change is at this moment? Please use a scale from 1 to 10, with '1' meaning it is &amp;quot;not at all a serious problem&lt;/code>
(scale 1-10)&lt;/p>
&lt;p>&lt;code>QA4 To what extent do you agree or disagree with each of the following statements? - Fighting climate change and using energy more efficiently can boost the economy and jobs in the EU&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QA4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU could benefit the EU economically&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QA5 Have you personally taken any action to fight climate change over the past six months?&lt;/code>
(binary)&lt;/p>
&lt;h2 id="eurobarometer-834-2015">Eurobarometer 83.4 (2015)&lt;/h2>
&lt;p>European Commission, Brussels; Directorate General Communication
COMM.A.1 ´Strategy, Corporate Communication Actions and
Eurobarometer´GESIS Data Archive, Cologne. ZA6595 Data file Version
3.0.0, &lt;a href="https://doi.org/10.4232/1.13146" target="_blank" rel="noopener">https://doi.org/10.4232/1.13146&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Data file: &lt;a href="https://search.gesis.org/research_data/ZA6595" target="_blank" rel="noopener">ZA6595&lt;/a>
data file (European Commission 2018).&lt;/li>
&lt;li>Questionnaire: &lt;a href="https://dbk.gesis.org/dbksearch/download.asp?id=57940" target="_blank" rel="noopener">Eurobarometer 83.4 Basic Bilingual
Questionnaire&lt;/a>&lt;/li>
&lt;li>Citation: &lt;a href="https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA6595&amp;amp;lang=en" target="_blank" rel="noopener">ZA6595
Bibtex&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="eurobarometer-871-2017">Eurobarometer 87.1 (2017)&lt;/h2>
&lt;p>European Commission, Brussels; Directorate General Communication,
COMM.A.1 ‘Strategic Communication’; European Parliament,
Directorate-General for Communication, Public Opinion Monitoring Unit
GESIS Data Archive, Cologne. ZA6861 Data file Version 1.2.0,
&lt;a href="https://doi.org/10.4232/1.12922" target="_blank" rel="noopener">https://doi.org/10.4232/1.12922&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Data file: &lt;a href="https://search.gesis.org/research_data/ZA6861" target="_blank" rel="noopener">ZA6861&lt;/a>
data file.&lt;/li>
&lt;li>Questionnaire: &lt;a href="https://dbk.gesis.org/dbksearch/download.asp?id=65967" target="_blank" rel="noopener">Eurobarometer 90.2 Basic Bilingual
Questionnaire&lt;/a>&lt;/li>
&lt;li>Citation: &lt;a href="https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA6861&amp;amp;lang=en" target="_blank" rel="noopener">ZA6861
Bibtex&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;code>QC1a Which of the following do you consider to be the single most serious problem facing the world as a whole?&lt;/code>
(single choice)&lt;/p>
&lt;p>&lt;code>QC1b Which others do you consider to be serious problems?&lt;/code> (multiple
choice)&lt;/p>
&lt;p>&lt;code>QC2 And how serious a problem do you think climate change is at this moment? Please use a scale from 1 to 10, with '1' meaning it is &amp;quot;not at all a serious problem&lt;/code>
(scale 1-10)&lt;/p>
&lt;p>&lt;code>Qc4 To what extent do you agree or disagree with each of the following statements? - Fighting climate change and using energy more efficiently can boost the economy and jobs in the EU&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>Qc4 To what extent do you agree or disagree with each of the following statements? - Promoting EU expertise in new clean technologies to countries outside the EU can benefit the EU economically&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>Qc4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can benefit the EU economically&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>Qc4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can increase the security of EU energy supplies&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>Qc4 To what extent do you agree or disagree with each of the following statements? - More public financial support should be given to the transition to clean energies even if it means subsidies to fossil fuels should be reduced.&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>Qc5 Have you personally taken any action to fight climate change over the past six months?&lt;/code>
(binary)&lt;/p>
&lt;h2 id="eurobarometer-902-2018">Eurobarometer 90.2 (2018)&lt;/h2>
&lt;p>European Commission, Brussels; Directorate General Communication,
COMM.A.3 ‘Media Monitoring and Eurobarometer’ GESIS Data Archive,
Cologne. ZA7488 Data file Version 1.0.0,
&lt;a href="https://doi.org/10.4232/1.13289" target="_blank" rel="noopener">https://doi.org/10.4232/1.13289&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Data file:
&lt;a href="https://dbk.gesis.org/dbksearch/sdesc2.asp?db=e&amp;amp;no=7488" target="_blank" rel="noopener">ZA7488&lt;/a>
data file (European Commission 2019a)&lt;/li>
&lt;li>Questionnaire: &lt;a href="https://dbk.gesis.org/dbksearch/download.asp?id=65967" target="_blank" rel="noopener">Eurobarometer 90.2 Basic Bilingual
Questionnaire&lt;/a>&lt;/li>
&lt;li>Citation: &lt;a href="https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA7488&amp;amp;lang=en" target="_blank" rel="noopener">ZA7488
Bibtex&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;code>QB5 To what extent do you agree or disagree with each of the following statements? - Fighting climate change and using energy more efficiently can boost the economy and jobs in the EU&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QB5 To what extent do you agree or disagree with each of the following statements? - Promoting EU expertise in new clean technologies to countries outside the EU can benefit the EU economically&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QB5 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can benefit the EU economically&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QB5 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can increase the security of EU energy supplies&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QB5 To what extent do you agree or disagree with each of the following statements? - More public financial support should be given to the transition to clean energies even if it means subsidies to fossil fuels should be reduced.&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;h2 id="eurobarometer-913-2019">Eurobarometer 91.3 (2019)&lt;/h2>
&lt;p>European Commission, Brussels; Directorate General Communication,
COMM.A.3 ‘Media Monitoring and Eurobarometer’ GESIS Data Archive,
Cologne. ZA7572 Data file Version 1.0.0,
&lt;a href="https://doi.org/10.4232/1.13372" target="_blank" rel="noopener">https://doi.org/10.4232/1.13372&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Data file:
&lt;a href="https://dbk.gesis.org/dbksearch/sdesc2.asp?db=e&amp;amp;no=7572" target="_blank" rel="noopener">ZA7572&lt;/a>
data file (European Commission 2019b).&lt;/li>
&lt;li>Questionnaire: &lt;a href="https://dbk.gesis.org/dbksearch/download.asp?id=66774" target="_blank" rel="noopener">Eurobarometer 91.3 Basic Bilingual
Questionnaire&lt;/a>&lt;/li>
&lt;li>Citation: &lt;a href="https://search.gesis.org/ajax/bibtex.php?type=research_data&amp;amp;docid=ZA7572&amp;amp;lang=en" target="_blank" rel="noopener">ZA7572
Bibtex&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;code>QB4 To what extent do you agree or disagree with each of the following statements? - Taking action on climate change will lead to innovation that will make EU companies more competitive (N)&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QB4 To what extent do you agree or disagree with each of the following statements? - Promoting EU expertise in new clean technologies to countries outside the EU can benefit the EU economically&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QB4 To what extent do you agree or disagree with each of the following statements? - Reducing fossil fuel imports from outside the EU can benefit the EU economically&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QB4 To what extent do you agree or disagree with each of the following statements? - Adapting to the adverse impacts of climate change can have positive outcomes for citizens in the EU&lt;/code>
(agreement-disagreement 4-scale)&lt;/p>
&lt;p>&lt;code>QB5 Have you personally taken any action to fight climate change over the past six months?&lt;/code>
(binary)&lt;/p>
&lt;h2 id="references">References&lt;/h2>
&lt;p>European Commission, Brussels. 2017. “Eurobarometer 80.2 (2013).” GESIS
Data Archive, Cologne. ZA5877 Data file Version 2.0.0,
&lt;a href="https://doi.org/10.4232/1.12792" target="_blank" rel="noopener">https://doi.org/10.4232/1.12792&lt;/a>. &lt;a href="https://doi.org/10.4232/1.12792" target="_blank" rel="noopener">https://doi.org/10.4232/1.12792&lt;/a>.&lt;/p>
&lt;p>———. 2018. “Eurobarometer 83.4 (2015).” GESIS Data Archive, Cologne.
ZA6595 Data file Version 3.0.0, &lt;a href="https://doi.org/10.4232/1.13146" target="_blank" rel="noopener">https://doi.org/10.4232/1.13146&lt;/a>.
&lt;a href="https://doi.org/10.4232/1.13146" target="_blank" rel="noopener">https://doi.org/10.4232/1.13146&lt;/a>.&lt;/p>
&lt;p>———. 2019a. “Eurobarometer 90.2 (2018).” GESIS Data Archive, Cologne.
ZA7488 Data file Version 1.0.0, &lt;a href="https://doi.org/10.4232/1.13289" target="_blank" rel="noopener">https://doi.org/10.4232/1.13289&lt;/a>.
&lt;a href="https://doi.org/10.4232/1.13289" target="_blank" rel="noopener">https://doi.org/10.4232/1.13289&lt;/a>.&lt;/p>
&lt;p>———. 2019b. “Eurobarometer 91.3 (2019).” GESIS Data Archive, Cologne.
ZA7572 Data file Version 1.0.0, &lt;a href="https://doi.org/10.4232/1.13372" target="_blank" rel="noopener">https://doi.org/10.4232/1.13372&lt;/a>.
&lt;a href="https://doi.org/10.4232/1.13372" target="_blank" rel="noopener">https://doi.org/10.4232/1.13372&lt;/a>.&lt;/p></description></item><item><title>Reproducible Survey Harmonization: retroharmonize Is Released</title><link>https://reprex-next.netlify.app/post/2020-09-21-retroharmonize_release/</link><pubDate>Mon, 21 Sep 2020 11:31:39 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2020-09-21-retroharmonize_release/</guid><description>&lt;p>Our original intention was to make surveying more accessible for music and creative industry partners, by relying more on already existing survey data, and better designing complementary, smaller surveys, becasue surveying, opinion polling is becoming increasingly expensive in the develop world. People are less and less likely to sit down for an interview in their houses. We have tried to harmonize our custom surveys, particuarly with Kantar in Hungary and Focus in Slovakia with exisiting EU projects. But we ended up making a part of international survey harmonization across countries and throughout years easier to automate.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/packages/ab_plot1.png" alt="Harmonized results from Afrobarometer" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Surveys are like sensors for natural sciences and industrial production. They are essential for almost any social and economic statistical indicator, for calculating the inflation, parts of the GDP, participation in education programs. Making surveys easier to harmonize and exploit more already existing survey data can bring down research cost, and can increase research value at the same time. (See our earlier blog post &lt;a href="https://dataobservatory.eu/post/2020-07-10-retroharmonize/" target="_blank" rel="noopener">Increase The Value Of Market Research With Open Data And Survey Harmonization&lt;/a>.)&lt;/p>
&lt;p>So, if you are an R user, you can use &lt;code>install.packages(“retroharmonize”)&lt;/code> to get the released 0.1.13 version and make tutorials with real Eurobarometer or Afrobarometer microdata. With &lt;code>devtools::install_github(&amp;quot;antaldaniel/retroharmonize&amp;quot;)&lt;/code> you can already install the current development version 0.1.14, which handles perl-like regex, which will be necessary for our next tutorial in the making for &lt;a href="https://www.arabbarometer.org/" target="_blank" rel="noopener">Arab Barometer&lt;/a>.&lt;/p>
&lt;p>&lt;strong>Related&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">retroharmonize package website&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://github.com/antaldaniel/retroharmonize/" target="_blank" rel="noopener">retroharmonize on github&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>retroharmonize R package for survey harmonization</title><link>https://reprex-next.netlify.app/software/retroharmonize/</link><pubDate>Tue, 25 Aug 2020 00:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/software/retroharmonize/</guid><description>&lt;h2 id="retrospective-data-harmonization">Retrospective data harmonization&lt;/h2>
&lt;p>The aim of &lt;code>retroharmonize&lt;/code> is to provide tools for reproducible
retrospective (ex-post) harmonization of datasets that contain variables
measuring the same concepts but coded in different ways. Ex-post data
harmonization enables better use of existing data and creates new
research opportunities. For example, harmonizing data from different
countries enables cross-national comparisons, while merging data from
different time points makes it possible to track changes over time.&lt;/p>
&lt;p>Retrospective data harmonization is associated with challenges including
conceptual issues with establishing equivalence and comparability,
practical complications of having to standardize the naming and coding
of variables, technical difficulties with merging data stored in
different formats, and the need to document a large number of data
transformations. The &lt;code>retroharmonize&lt;/code> package assists with the latter
three components, freeing up the capacity of researchers to focus on the
first.&lt;/p>
&lt;p>Specifically, the &lt;code>retroharmonize&lt;/code> package proposes a reproducible
workflow, including a new class for storing data together with the
harmonized and original metadata, as well as functions for importing
data from different formats, harmonizing data and metadata, documenting
the harmonization process, and converting between data types. See
&lt;a href="https://retroharmonize.dataobservatory.eu/reference/retrohamonize.html" target="_blank" rel="noopener">here&lt;/a>
for an overview of the functionalities.&lt;/p>
&lt;p>The new &lt;code>labelled_spss_survey()&lt;/code> class is an extension of &lt;a href="https://haven.tidyverse.org/reference/labelled_spss.html" target="_blank" rel="noopener">haven’s labelled_spss class&lt;/a>. It not
only preserves variable and value labels and the user-defined missing
range, but also gives an identifier, for example, the filename or the
wave number, to the vector. Additionally, it enables the preservation –
as metadata attributes – of the original variable names, labels, and
value codes and labels, from the source data, in addition to the
harmonized variable names, labels, and value codes and labels. This way,
the harmonized data also contain the pre-harmonization record. The
stored original metadata can be used for validation and documentation
purposes.&lt;/p>
&lt;p>The vignette &lt;a href="https://retroharmonize.dataobservatory.eu/articles/labelled_spss_survey.html" target="_blank" rel="noopener">Working With The labelled_spss_survey Class&lt;/a>
provides more information about the &lt;code>labelled_spss_survey()&lt;/code> class.&lt;/p>
&lt;p>In &lt;a href="https://retroharmonize.dataobservatory.eu/articles/harmonize_labels.html" target="_blank" rel="noopener">Harmonize Value Labels&lt;/a>
we discuss the characteristics of the &lt;code>labelled_spss_survey()&lt;/code> class and
demonstrates the problems that using this class solves.&lt;/p>
&lt;p>We also provide three extensive case studies illustrating how the
&lt;code>retroharmonize&lt;/code> package can be used for ex-post harmonization of data
from cross-national surveys:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html" target="_blank" rel="noopener">Afrobarometer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html" target="_blank" rel="noopener">Arab
Barometer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html" target="_blank" rel="noopener">Eurobarometer&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>The creators of &lt;code>retroharmonize&lt;/code> are not affiliated with either
Afrobarometer, Arab Barometer, Eurobarometer, or the organizations that
designs, produces or archives their surveys.&lt;/p>
&lt;p>We started building an experimental APIs data is running retroharmonize
regularly and improving known statistical data sources. See: &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>, &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a>, &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data Observatory&lt;/a>.&lt;/p>
&lt;h2 id="citations-and-related-work">Citations and related work&lt;/h2>
&lt;h3 id="citing-the-data-sources">Citing the data sources&lt;/h3>
&lt;p>Our package has been tested on three harmonized survey’s microdata.
Because &lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">retroharmonize&lt;/a> is
not affiliated with any of these data sources, to replicate our
tutorials or work with the data, you have download the data files from
these sources, and you have to cite those sources in your work.&lt;/p>
&lt;p>&lt;strong>Afrobarometer&lt;/strong> data: Cite
&lt;a href="https://afrobarometer.org/data/" target="_blank" rel="noopener">Afrobarometer&lt;/a> &lt;strong>Arab Barometer&lt;/strong>
data: cite &lt;a href="https://www.arabbarometer.org/survey-data/data-downloads/" target="_blank" rel="noopener">Arab
Barometer&lt;/a>.
&lt;strong>Eurobarometer&lt;/strong> data: The
&lt;a href="https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm" target="_blank" rel="noopener">Eurobarometer&lt;/a>
data
&lt;a href="https://ec.europa.eu/commfrontoffice/publicopinion/index.cfm" target="_blank" rel="noopener">Eurobarometer&lt;/a>
raw data and related documentation (questionnaires, codebooks, etc.) are
made available by &lt;em>GESIS&lt;/em>, &lt;em>ICPSR&lt;/em> and through the &lt;em>Social Science Data
Archive&lt;/em> networks. You should cite your source, in our examples, we rely
on the
&lt;a href="https://www.gesis.org/en/eurobarometer-data-service/search-data-access/data-access" target="_blank" rel="noopener">GESIS&lt;/a>
data files.&lt;/p>
&lt;h3 id="citing-the-retroharmonize-r-package">Citing the retroharmonize R package&lt;/h3>
&lt;p>For main developer and contributors, see the
&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">package&lt;/a> homepage.&lt;/p>
&lt;p>This work can be freely used, modified and distributed under the GPL-3
license:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-r" data-lang="r">&lt;span class="line">&lt;span class="cl">&lt;span class="nf">citation&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s">&amp;#34;retroharmonize&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; To cite package &amp;#39;retroharmonize&amp;#39; in publications use:&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; Daniel Antal (2021). retroharmonize: Ex Post Survey Data&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; Harmonization. R package version 0.1.17.&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; https://retroharmonize.dataobservatory.eu/&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; A BibTeX entry for LaTeX users is&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; &lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; @Manual{,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; title: {retroharmonize: Ex Post Survey Data Harmonization},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; author: {Daniel Antal},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; year: {2021},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; doi: {10.5281/zenodo.5006056},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; note: {R package version 0.1.17},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; url: {https://retroharmonize.dataobservatory.eu/},&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1">#&amp;gt; }&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="contact">Contact&lt;/h3>
&lt;p>For contact information, contributors, see the
&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">package&lt;/a> homepage.&lt;/p>
&lt;h3 id="code-of-conduct">Code of Conduct&lt;/h3>
&lt;p>Please note that the &lt;code>retroharmonize&lt;/code> project is released with a
&lt;a href="https://www.contributor-covenant.org/version/2/0/code_of_conduct/" target="_blank" rel="noopener">Contributor Code of Conduct&lt;/a>.
By contributing to this project, you agree to abide by its terms.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Click the &lt;em>Cite&lt;/em> button above to demo the feature to enable visitors to import publication metadata into their reference management software.
&lt;/div>
&lt;/div></description></item><item><title>CEEMID</title><link>https://reprex-next.netlify.app/project/ceemid/</link><pubDate>Wed, 27 Feb 2019 00:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/project/ceemid/</guid><description>&lt;p>&lt;strong>CEEMID&lt;/strong> was multi-country project that was a predecessor of Reprex’s &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>. It transferred thousands of indicators to the Digital Music Observatory and offered it to the future European Music Observatory.&lt;/p>
&lt;p>The CEEMID project (2014-2020) formed the basis of our first data observatory &amp;mdash; the challenges of a 12-country data collection and sharing project forced our team to find novel software, data governance and data processing solution that fitted a very fragmented, multi-language music industry with many internal and external challenges. We showed that we can provide reach, data-driven evidence in seemingly data poor countries. We can find alternative data sources when there are no data vendors, no official government statistics are present, or when important data assets cannot be used due to business confidentiality clauses.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-the-central-european-music-industry-report-2020httpsceereport2020ceemideu-concluded-the-ceemid-project-showing-that-building-meaningful-statistical-indicators-for-the-live-performance-recording-and-publishing-sides-of-the-music-industry-is-possible-even-in-seemingly-data-poor-emerging-and-future-markets-our-report-was-presented-as-a-best-practicehttpsmusicdataobservatoryeupost2020-01-30-ceereport-by-the-european-commission-and-the-geothe-institute-in-2020">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="The [Central European Music Industry Report 2020](https://ceereport2020.ceemid.eu/) concluded the CEEMID project showing that building meaningful statistical indicators for the live performance, recording and publishing sides of the music industry is possible even in seemingly data poor emerging and future markets. Our Report was presented as a [best practice](https://music.dataobservatory.eu/post/2020-01-30-ceereport/) by the European Commission and the Geothe Institute in 2020." srcset="
/media/img/reports/ceereport_2020/frontcover_wide_hu0a0f30584267a2cdec6691beefdbbc9f_34932_9ec4301f74d6efc3db691db5b5db6a66.webp 400w,
/media/img/reports/ceereport_2020/frontcover_wide_hu0a0f30584267a2cdec6691beefdbbc9f_34932_e012b143d3ea788b7a97acc9447d0897.webp 760w,
/media/img/reports/ceereport_2020/frontcover_wide_hu0a0f30584267a2cdec6691beefdbbc9f_34932_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://reprex-next.netlify.app/media/img/reports/ceereport_2020/frontcover_wide_hu0a0f30584267a2cdec6691beefdbbc9f_34932_9ec4301f74d6efc3db691db5b5db6a66.webp"
width="400"
height="200"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
The &lt;a href="https://ceereport2020.ceemid.eu/" target="_blank" rel="noopener">Central European Music Industry Report 2020&lt;/a> concluded the CEEMID project showing that building meaningful statistical indicators for the live performance, recording and publishing sides of the music industry is possible even in seemingly data poor emerging and future markets. Our Report was presented as a &lt;a href="https://music.dataobservatory.eu/post/2020-01-30-ceereport/" target="_blank" rel="noopener">best practice&lt;/a> by the European Commission and the Geothe Institute in 2020.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>In 2019 we offered CEEMID as one possible alternative to building the European Music Observatory. In our experience and understanding, the music industry has many failed international data projects because of the inherent conflicts of interests among big and large countries, authors and producers, producers, and performers, and in 2020 we launched our &lt;a href="https://reprex-next.netlify.app/observatories/music">Digital Music Observatory&lt;/a>.&lt;/p>
&lt;p>From an early stage, there has been an interest in our solutions from newer and newer industries. During the validation of our observatory’s product/market fit, we realized that we gave a novel solution to a problem that is not at all unique to the music industry. The fragmentation of data assets among the institutional boundaries of small enterprises and NGOs, the fragmentation of research budgets that disallow comprehensive data collection problems, the reliance on questionable quality and hard-to-integrate public and third-party data is present in almost all business and policy domains.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-the-case-study-in-belgiumhttpsgreendealdataobservatoryeupost2021-04-23-belgium-flood-insurance-brings-together-open-data-in-a-novel-way-from-satellite-hydrological-opinion-polling-and-tax-administration-data-to-show-the-geographical-overlap-of-catastrophic-drought-and-flood-risk-with-the-local-political-awareness-and-financial-capacity-to-manage-it-our-approach-to-music-data-turned-out-to-be-applicable-in-many-situations-when-the-data-and-the-research-capacities-are-fragmented">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="The [Case Study in Belgium](https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/) brings together open data in a novel way from satellite, hydrological, opinion polling and tax administration data to show the geographical overlap of catastrophic drought and flood risk with the local political awareness and financial capacity to manage it. Our approach to music data turned out to be applicable in many situations when the data and the research capacities are fragmented." srcset="
/media/slides/Crunchconf_2021/Slide6_hu0af4a93231d6f282471b1ca73533e509_486981_7b5fa0d4e7334a14dac31c4342cb051b.webp 400w,
/media/slides/Crunchconf_2021/Slide6_hu0af4a93231d6f282471b1ca73533e509_486981_32bd08887b964300ff44300fa3635464.webp 760w,
/media/slides/Crunchconf_2021/Slide6_hu0af4a93231d6f282471b1ca73533e509_486981_1200x1200_fit_q75_h2_lanczos.webp 1200w"
src="https://reprex-next.netlify.app/media/slides/Crunchconf_2021/Slide6_hu0af4a93231d6f282471b1ca73533e509_486981_7b5fa0d4e7334a14dac31c4342cb051b.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
The &lt;a href="https://greendeal.dataobservatory.eu/post/2021-04-23-belgium-flood-insurance/" target="_blank" rel="noopener">Case Study in Belgium&lt;/a> brings together open data in a novel way from satellite, hydrological, opinion polling and tax administration data to show the geographical overlap of catastrophic drought and flood risk with the local political awareness and financial capacity to manage it. Our approach to music data turned out to be applicable in many situations when the data and the research capacities are fragmented.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>We developed reliable software tools and know-how bring up reusable open government and open science data, which are available upon request in developed countries like raw diamonds, without the polishing of data scientists, documentation, or modern databases. We learned how to build a decentralized approach to data sharing and resource pooling using the agile open collaboration method of open-source software development. We became experts in handling legally open data and freedom of information requests on an industrial scale. After the launch of the Digital Music Observatory, we started building three new observatories with various partners in Computational Antitrust, Green Deal (climate change) and the broader Cultural Creative Sectors &amp;amp; Industries.&lt;/p>
&lt;p>CEEMID did not only work with public and openly accessible data. We used these novel sources when other data was not available, legally could not be used, or it was prohibitively expensive. But most of the value we created for music rightsholders required the management of highly confidential data. We know that a public data observatory is not ideal for all potential users. We are piloting fully private observatories, and a blend of pubic (and therefore widely trustworthy) and proprietary hybrid observatories based on the experience of the ‘public’ and ‘private’ layers of CEEMID.&lt;/p>
&lt;td style="text-align: center;">
&lt;figure id="figure-historically-ceemid-started-out-as-the-_central-and-eastern-european-music-industry-databases_-out-of-necessity-following-a-cisac-good-governance-seminar-for-european-societieshttpsceemideupost2013-11-18_cisac_goodgov-in-2013-and-eventually-grew-out-of-an-abandoned-gesac-project-the-adoption-of-european-single-market-and-copyright-rules-and-the-increased-activity-of-competition-authority-and-regulators-required-a-more-structured-approach-to-set-collective-royalty-and-compensations-tariffs-in-a-region-that-was-regarded-traditionally-as-data-poor-with-lower-quantity-of-industry-and-government-data-sources-available">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Historically CEEMID started out as the _Central and Eastern European Music Industry Databases_ out of necessity following a [CISAC Good Governance Seminar for European Societies](https://ceemid.eu/post/2013-11-18_cisac_goodgov/) in 2013, and eventually grew out of an abandoned GESAC project. The adoption of European single market and copyright rules, and the increased activity of competition authority and regulators required a more structured approach to set collective royalty and compensations tariffs in a region that was regarded traditionally as data-poor with lower quantity of industry and government data sources available." srcset="
/media/img/slides/cisac_good_governance_2013_hub4b4b5255bd977bbebff8bcf47ad0e98_125454_2a3799832a54545b105f9274407e0bd6.webp 400w,
/media/img/slides/cisac_good_governance_2013_hub4b4b5255bd977bbebff8bcf47ad0e98_125454_a0e06b6864abe2a0479b94c611245453.webp 760w,
/media/img/slides/cisac_good_governance_2013_hub4b4b5255bd977bbebff8bcf47ad0e98_125454_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/slides/cisac_good_governance_2013_hub4b4b5255bd977bbebff8bcf47ad0e98_125454_2a3799832a54545b105f9274407e0bd6.webp"
width="760"
height="526"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Historically CEEMID started out as the &lt;em>Central and Eastern European Music Industry Databases&lt;/em> out of necessity following a &lt;a href="https://ceemid.eu/post/2013-11-18_cisac_goodgov/" target="_blank" rel="noopener">CISAC Good Governance Seminar for European Societies&lt;/a> in 2013, and eventually grew out of an abandoned GESAC project. The adoption of European single market and copyright rules, and the increased activity of competition authority and regulators required a more structured approach to set collective royalty and compensations tariffs in a region that was regarded traditionally as data-poor with lower quantity of industry and government data sources available.
&lt;/figcaption>&lt;/figure>&lt;/td>
&lt;p>In 2014 three societies, Artisjus, HDS and SOZA realized that need to make further efforts to modernize the way they measure their own economic impact, the economic value of their licenses to remain competitive in advocating the interests vis-à-vis domestic governments, international organizations like CISAC and GESAC and the European Union. They signed a Memorandum of Understanding with their consultant to set up the CEEMID databases and to harmonize their efforts. The adoption of European single market and copyright rules, and the increased activity of competition authority and regulators required a more structured approach to set collective royalty and compensations tariffs in a region that was regarded traditionally as data-poor with lower quantity of industry and government data sources available, but quickly covered the whole European area (Read more about our &lt;a href="https://documentation.ceemid.eu/index.php?title=Main_Page#Data_Coverage" target="_blank" rel="noopener">data coverage&lt;/a> and our pan-European &lt;a href="https://documentation.ceemid.eu/index.php?title=Main_Page#Geographic_Coverage" target="_blank" rel="noopener">geographical coverage&lt;/a>.)&lt;/p>
&lt;p>We believe that CEEMID could fulfill the functions of the European Music Industry if it would find a sustainable financing that makes access to all our data open. We need to be able to avoid the &lt;a href="https://en.wikipedia.org/wiki/Tragedy_of_the_commons" target="_blank" rel="noopener">tragedy of the commons&lt;/a>, where only a few industry users contribute to the financing of thousands of indicators that could potentially benefits tens of thousands of stakeholders in the EU.&lt;/p></description></item></channel></rss>