<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>data collection | Reprex</title><link>https://reprex-next.netlify.app/tag/data-collection/</link><atom:link href="https://reprex-next.netlify.app/tag/data-collection/index.xml" rel="self" type="application/rss+xml"/><description>data collection</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Mon, 28 Jun 2021 09:00:00 +0000</lastBuildDate><image><url>https://reprex-next.netlify.app/media/icon_hub9491570ac57158c0eeecc95c95b13e5_20247_512x512_fill_lanczos_center_3.png</url><title>data collection</title><link>https://reprex-next.netlify.app/tag/data-collection/</link></image><item><title>Including Indicators from Arab Barometer in Our Observatory</title><link>https://reprex-next.netlify.app/post/2021-06-28-arabbarometer/</link><pubDate>Mon, 28 Jun 2021 09:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-28-arabbarometer/</guid><description>&lt;p>&lt;em>A new version of the retroharmonize R package – which is working with retrospective, ex post harmonization of survey data – was released yesterday after peer-review on CRAN. It allows us to compare opinion polling data from the Arab Barometer with the Eurobarometer and Afrorbarometer. This is the first version that is released in the rOpenGov community, a community of R package developers on open government data analytics and related topics.&lt;/em>&lt;/p>
&lt;p>Surveys are the most important data sources in social and economic
statistics – they ask people about their lives, their attitudes and
self-reported actions, or record data from companies and NGOs. Survey
harmonization makes survey data comparable across time and countries. It
is very important, because often we do not know without comparison if an
indicator value is &lt;em>low&lt;/em> or &lt;em>high&lt;/em>. If 40% of the people think that
&lt;em>climate change is a very serious problem&lt;/em>, it does not really tell us
much without knowing what percentage of the people answered this
question similarly a year ago, or in other parts of the world.&lt;/p>
&lt;p>With the help of Ahmed Shabani and Yousef Ibrahim, we created a third
case study after the
&lt;a href="https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html" target="_blank" rel="noopener">Eurobarometer&lt;/a>,
and
&lt;a href="https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html" target="_blank" rel="noopener">Afrobarometer&lt;/a>,
about working with the &lt;a href="https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html" target="_blank" rel="noopener">Arab
Barometer&lt;/a>
harmonized survey data files.&lt;/p>
&lt;p>&lt;em>Ex ante&lt;/em> survey harmonization means that researchers design
questionnaires that are asking the same questions with the same survey
methodology in repeated, distinct times (waves), or across different
countries with carefully harmonized question translations. &lt;em>Ex post&lt;/em>
harmonizations means that the resulting data has the same variable
names, same variable coding, and can be joined into a tidy data frame
for joint statistical analysis. While seemingly a simple task, it
involves plenty of metadata adjustments, because established survey
programs like Eurobarometer, Afrobarometer or Arab Barometer have
several decades of history, and several decades of coding practices and
file formatting legacy.&lt;/p>
&lt;ul>
&lt;li>&lt;em>Variable harmonization&lt;/em> means that if the same question is called
in one microdata source &lt;code>Q108&lt;/code> and the other &lt;code>eval-parl-elections&lt;/code>
then we make sure that they get a harmonize and machine readable
name without spaces and special characters.&lt;/li>
&lt;li>&lt;em>Variable label harmonization&lt;/em> means that the same questionnaire
items get the same numeric coding and same categorical labels.&lt;/li>
&lt;li>&lt;em>Missing case harmonization&lt;/em> means that various forms of missingness
are treated the same way.&lt;/li>
&lt;/ul>
&lt;figure id="figure-for-the-evaluation-of-the-economic-situation-dataset-get-the-country-averages-and-aggregates-from-zenodohttpdoiorg105281zenodo5036432-and-the-plot-in-jpg-or-png-from-figsharehttpsfigsharecomarticlesfigurearab_barometer_5_econ_eval_by_country_png14865498">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/blogposts_2021/arab_barometer_5_evon_eval_by_country.png" alt="For the evaluation of the economic situation dataset, get the country averages and aggregates from [Zenodo](http://doi.org/10.5281/zenodo.5036432), and the plot in `jpg` or `png` from [figshare](https://figshare.com/articles/figure/arab_barometer_5_econ_eval_by_country_png/14865498)." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
For the evaluation of the economic situation dataset, get the country averages and aggregates from &lt;a href="http://doi.org/10.5281/zenodo.5036432" target="_blank" rel="noopener">Zenodo&lt;/a>, and the plot in &lt;code>jpg&lt;/code> or &lt;code>png&lt;/code> from &lt;a href="https://figshare.com/articles/figure/arab_barometer_5_econ_eval_by_country_png/14865498" target="_blank" rel="noopener">figshare&lt;/a>.
&lt;/figcaption>&lt;/figure>
&lt;p>In our new &lt;a href="https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html" target="_blank" rel="noopener">Arab Barometer case
study&lt;/a>,
the evaulation of parliamentary elections has the following labels. We
code them consistently &lt;code>1: free_and_fair&lt;/code>, &lt;code>2: some_minor_problems&lt;/code>,
&lt;code>3: some_major_problems&lt;/code> and &lt;code>4: not_free&lt;/code>.&lt;/p>
&lt;table>
&lt;colgroup>
&lt;col style="width: 50%" />
&lt;col style="width: 50%" />
&lt;/colgroup>
&lt;tbody>
&lt;tr class="odd">
&lt;td style="text-align: left;">“0. missing”&lt;/td>
&lt;td style="text-align: left;">“1. they were completely free and fair”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“2. they were free and fair, with some minor problems”&lt;/td>
&lt;td style="text-align: left;">“3. they were free and fair, with some major problems”&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td style="text-align: left;">“4. they were not free and fair”&lt;/td>
&lt;td style="text-align: left;">“8. i don’t know”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“9. declined to answer”&lt;/td>
&lt;td style="text-align: left;">“Missing”&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td style="text-align: left;">“They were completely free and fair”&lt;/td>
&lt;td style="text-align: left;">“They were free and fair, with some minor breaches”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“They were free and fair, with some major breaches”&lt;/td>
&lt;td style="text-align: left;">“They were not free and fair”&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td style="text-align: left;">“Don’t know”&lt;/td>
&lt;td style="text-align: left;">“Refuse”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“Completely free and fair”&lt;/td>
&lt;td style="text-align: left;">“Free and fair, but with minor problems”&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td style="text-align: left;">“Free and fair, with major problems”&lt;/td>
&lt;td style="text-align: left;">“Not free or fair”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“Don’t know (Do not read)”&lt;/td>
&lt;td style="text-align: left;">“Decline to answer (Do not read)”&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Of course, this harmonization is essential to get clean results like this:&lt;/p>
&lt;figure id="figure-for-evaluation-or-reuse-of-parliamentary-elections-dataset-get-the-replication-data-and-the-code-from-the-zenodohhttpsdoiorg105281zenodo5034759-open-repository">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="For evaluation or reuse of parliamentary elections dataset get the replication data and the code from the [Zenodo](hhttps://doi.org/10.5281/zenodo.5034759) open repository." srcset="
/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_30b9d9bccbe8f347c912dbe10ef5159c.webp 400w,
/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_f7e62366b8310160e9cdd16714a5ac44.webp 760w,
/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_30b9d9bccbe8f347c912dbe10ef5159c.webp"
width="506"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
For evaluation or reuse of parliamentary elections dataset get the replication data and the code from the &lt;a href="hhttps://doi.org/10.5281/zenodo.5034759">Zenodo&lt;/a> open repository.
&lt;/figcaption>&lt;/figure>
&lt;p>In our case study, we had three forms of missingness: the respondent
&lt;em>did not know&lt;/em> the answer, the respondent &lt;em>did not want&lt;/em> to answer, and
at last, in some cases the &lt;em>respondent was not asked&lt;/em>, because the
country held no parliamentary elections. While in numerical processing,
all these answers must be left out from calculating averages, for
example, in a more detailed, categorical analysis they represent very
different cases. A high level of refusal to answer may be an indicator
of surpressing democratic opinion forming in itself.&lt;/p>
&lt;p>Survey harmonization with many countries entails tens of thousands of
small data management task, which, unless automatically documented,
logged, and created with a reproducible code, is a helplessly
error-prone process. We believe that our open-source software will bring
many new statistical information to the light, which, while legally
open, was never processed due to the large investment needed.&lt;/p>
&lt;p>We also started building experimental APIs data is running
&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">retroharmonize&lt;/a> regularly.
We will place cultural access and participation data in the &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital
Music Observatory&lt;/a>, climate
awareness, policy support and self-reported mitigation strategies into
the &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data
Observatory&lt;/a>, and economy and
well-being data into our &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data
Observatory&lt;/a>.&lt;/p>
&lt;h2 id="further-plans">Further plans&lt;/h2>
&lt;p>Retrospective survey harmonization is a far more complex task than this
blogpost suggest. Retrospective survey harmonization is a far more complex task than this blogpost suggest, because established survey programs have gathered decades of legacy data in legacy coding schemes and legacy file formats. Putting the data right, and especially putting the invaluable descriptive and administrative (processing) metadata right is a huge undertaking. We are releasing example codes, datasets and charts for researchers to comapre our harmonized results with theirs, and improve our software. We are releasing example codes, datasets and charts
for researchers to comapre our harmonized results with theirs, and
improve our software.&lt;/p>
&lt;h3 id="use-our-software">Use our software&lt;/h3>
&lt;p>The &lt;code>retroharmonize&lt;/code> R package can be freely used, modified and
distributed under the GPL-3 license. For the main developer and
contributors, see the
&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">package&lt;/a> homepage. If you
use it for your work, please kindly cite it as:&lt;/p>
&lt;p>Daniel Antal (2021). retroharmonize: Ex Post Survey Data Harmonization.
R package version 0.1.17. &lt;a href="https://doi.org/10.5281/zenodo.5034752" target="_blank" rel="noopener">https://doi.org/10.5281/zenodo.5034752&lt;/a>&lt;/p>
&lt;p>Download the &lt;a href="https://reprex-next.netlify.app/media/bibliography/cite-retroharmonize.bib" target="_blank">BibLaTeX entry&lt;/a>.&lt;/p>
&lt;h3 id="tutorial-to-work-with-the-arab-barometer-survey-data">Tutorial to work with the Arab Barometer survey data&lt;/h3>
&lt;p>Daniel Antal, &amp;amp; Ahmed Shaibani. (2021, June 26). Case Study: Working
With Arab Barometer Surveys for the retroharmonize R package (Version
0.1.6). Zenodo. &lt;a href="https://doi.org/10.5281/zenodo.5034759" target="_blank" rel="noopener">https://doi.org/10.5281/zenodo.5034759&lt;/a>&lt;/p>
&lt;p>For the replication data to report potential
&lt;a href="https://github.com/rOpenGov/retroharmonize/issues" target="_blank" rel="noopener">issues&lt;/a> and
improvement suggestions with the code:&lt;/p>
&lt;p>Daniel Antal, &amp;amp; Ahmed Shaibani. (2021). Replication Data for the
retroharmonize R Package Case Study: Working With Arab Barometer Surveys
(Version 0.1.6) [Data set]. Zenodo.
&lt;a href="https://doi.org/10.5281/zenodo.5034741" target="_blank" rel="noopener">https://doi.org/10.5281/zenodo.5034741&lt;/a>&lt;/p>
&lt;h3 id="experimental-api">Experimental API&lt;/h3>
&lt;p>We are also experimenting with the automated placement of authoritative
and citeable figures and datasets in open repositories. For the climate
awareness dataset get the country averages and aggregates from
&lt;a href="http://doi.org/10.5281/zenodo.5036432" target="_blank" rel="noopener">Zenodo&lt;/a>, and the plot in &lt;code>jpg&lt;/code>
or &lt;code>png&lt;/code> from &lt;a href="https://figshare.com/articles/figure/arab_barometer_5_econ_eval_by_country_png/14865498" target="_blank" rel="noopener">figshare&lt;/a>.
Our plan is to release open data in a modern API with rich descriptive
metadata meeting the &lt;em>Dublin Core&lt;/em> and &lt;em>DataCite&lt;/em> standards, and further
administrative metadata for correct coding, joining and further
manipulating or data, or for easy import into your database.&lt;/p>
&lt;h3 id="join-our-open-source-effort">Join our open source effort&lt;/h3>
&lt;p>Want to help us improve our open data service? Include
&lt;a href="https://www.latinobarometro.org/lat.jsp" target="_blank" rel="noopener">Lationbarómetro&lt;/a> and the
&lt;a href="https://caucasusbarometer.org/en/datasets/" target="_blank" rel="noopener">Caucasus Barometer&lt;/a> in our
offering? Join the rOpenGov community of R package developers, an our
open collaboration to create the automated data observatories. We are
not only looking for
&lt;a href="https://reprex-next.netlify.app/authors/developer/">developers&lt;/a>,
but &lt;a href="https://reprex-next.netlify.app/authors/curator/">data
curators&lt;/a> and
&lt;a href="https://reprex-next.netlify.app/authors/team/">service design
associates&lt;/a>, too.&lt;/p></description></item><item><title>Open Data - The New Gold Without the Rush</title><link>https://reprex-next.netlify.app/post/2021-06-18-gold-without-rush/</link><pubDate>Fri, 18 Jun 2021 17:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-18-gold-without-rush/</guid><description>&lt;p>&lt;em>If open data is the new gold, why even those who release fail to reuse it? We created an open collaboration of data curators and open-source developers to dig into novel open data sources and/or increase the usability of existing ones. We transform reproducible research software into research- as-service.&lt;/em>&lt;/p>
&lt;p>Every year, the EU announces that billions and billions of data are now “open” again, but this is not gold. At least not in the form of nicely minted gold coins, but in gold dust and nuggets found in the muddy banks of chilly rivers. There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.&lt;/p>
&lt;figure id="figure-there-is-no-rush-for-it-because-panning-out-its-value-requires-a-lot-of-hours-of-hard-work-our-goal-is-to-automate-this-work-to-make-open-data-usable-at-scale-even-in-trustworthy-ai-solutions">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions." srcset="
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp 400w,
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_faa00e96d3d0b700cfcf1daa513f3ad2.webp 760w,
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.
&lt;/figcaption>&lt;/figure>
&lt;p>Most open data is not public, it is not downloadable from the Internet – in the EU parlance, “open” only means a legal entitlement to get access to it. And even in the rare cases when data is open and public, often it is mired by data quality issues. We are working on the prototypes of a data-as-service and research-as-service built with open-source statistical software that taps into various and often neglected open data sources.&lt;/p>
&lt;p>We are in the prototype phase in June and our intentions are to have a well-functioning service by the time of the conference, because we are working only with open-source software elements; our technological readiness level is already very high. The novelty of our process is that we are trying to further develop and integrate a few open-source technology items into technologically and financially sustainable data-as-service and even research-as-service solutions.&lt;/p>
&lt;figure id="figure-our-review-of-about-80-eu-un-and-oecd-data-observatories-reveals-that-most-of-them-do-not-use-these-organizationss-open-data---instead-they-use-various-and-often-not-well-processed-proprietary-sources">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;#39;s open data - instead they use various, and often not well processed proprietary sources." srcset="
/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_0079ea9844f6c5e52b52fd0e627467a2.webp 400w,
/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_ecd6d08ba5e9bac19c8173546f036651.webp 760w,
/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_0079ea9844f6c5e52b52fd0e627467a2.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;rsquo;s open data - instead they use various, and often not well processed proprietary sources.
&lt;/figcaption>&lt;/figure>
&lt;p>We are taking a new and modern approach to the &lt;code>data observatory&lt;/code> concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science. Various UN and OECD bodies, and particularly the European Union support or maintain more than 60 data observatories, or permanent data collection and dissemination points, but even these do not use these organizations and their members open data. We are building open-source data observatories, which run open-source statistical software that automatically processes and documents reusable public sector data (from public transport, meteorology, tax offices, taxpayer funded satellite systems, etc.) and reusable scientific data (from EU taxpayer funded research) into new, high quality statistical indicators.&lt;/p>
&lt;figure id="figure-we-are-taking-a-new-and-modern-approach-to-the-data-observatory-concept-and-modernizing-it-with-the-application-of-21st-century-data-and-metadata-standards-the-new-results-of-reproducible-research-and-data-science">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/slides/automated_observatory_value_chain.jpg" alt="We are taking a new and modern approach to the ‘data observatory’ concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
We are taking a new and modern approach to the ‘data observatory’ concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science
&lt;/figcaption>&lt;/figure>
&lt;ul>
&lt;li>We are building various open-source data collection tools in R and Python to bring up data from big data APIs and legally open, but not public, and not well served data sources. For example, we are working on capturing representative data from the Spotify API or creating harmonized datasets from the Eurobarometer and Afrobarometer survey programs.&lt;/li>
&lt;li>Open data is usually not public; whatever is legally accessible is usually not ready to use for commercial or scientific purposes. In Europe, almost all taxpayer funded data is legally open for reuse, but it is usually stored in heterogeneous formats, processed into an original government or scientific need, and with various and low documentation standards. Our expert data curators are looking for new data sources that should be (re-) processed and re-documented to be usable for a wider community. We would like to introduce our service flow, which touches upon many important aspects of data scientist, data engineer and data curatorial work.&lt;/li>
&lt;li>We believe that even such generally trusted data sources as Eurostat often need to be reprocessed, because various legal and political constraints do not allow the common European statistical services to provide optimal quality data – for example, on the regional and city levels.&lt;/li>
&lt;li>With &lt;a href="https://reprex-next.netlify.app/authors/ropengov/">rOpenGov&lt;/a> and other partners, we are creating open-source statistical software in R to re-process these heterogenous and low-quality data into tidy statistical indicators to automatically validate and document it.&lt;/li>
&lt;li>We are carefully documenting and releasing administrative, processing, and descriptive metadata, following international metadata standards, to make our data easy to find and easy to use for data analysts.&lt;/li>
&lt;li>We are automatically creating depositions and authoritative copies marked with an individual digital object identifier (DOI) to maintain data integrity.&lt;/li>
&lt;li>We are building simple databases and supporting APIs that release the data without restrictions, in a tidy format that is easy to join with other data, or easy to join into databases, together with standardized metadata.&lt;/li>
&lt;li>We maintain observatory websites (see: &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>, &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a>, &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data Observatory&lt;/a>) where not only the data is available, but we provide tutorials and use cases to make it easier to use them. Our mission is to show a modern, 21st century reimagination of the data observatory concept developed and supported by the UN, EU and OECD, and we want to show that modern reproducible research and open data could make the existing 60 data observatories and the planned new ones grow faster into data ecosystems.&lt;/li>
&lt;/ul>
&lt;p>We are working around the open collaboration concept, which is well-known in open source software development and reproducible science, but we try to make this agile project management methodology more inclusive, and include data curators, and various institutional partners into this approach. Based around our early-stage startup, Reprex, and the open-source developer community rOpenGov, we are working together with other developers, data scientists, and domain specific data experts in climate change and mitigation, antitrust and innovation policies, and various aspects of the music and film industry.&lt;/p>
&lt;figure id="figure-our-open-collaboration-is-truly-open-new-data-curatorsauthorscuratordevelopersauthorsdeveloper-and-service-designersauthorsteam-even-volunteers-and-citizen-scientists-are-welcome-to-join">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our open collaboration is truly open: new [data curators](/authors/curator/),[developers](/authors/developer/) and [service designers](/authors/team/), even volunteers and citizen scientists are welcome to join." srcset="
/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_a07a8e618fa7317f6f8256b9a334262e.webp 400w,
/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_3a4ae7f72478fd880961b08e1f7075dd.webp 760w,
/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_a07a8e618fa7317f6f8256b9a334262e.webp"
width="760"
height="427"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our open collaboration is truly open: new &lt;a href="https://reprex-next.netlify.app/authors/curator/">data curators&lt;/a>,&lt;a href="https://reprex-next.netlify.app/authors/developer/">developers&lt;/a> and &lt;a href="https://reprex-next.netlify.app/authors/team/">service designers&lt;/a>, even volunteers and citizen scientists are welcome to join.
&lt;/figcaption>&lt;/figure>
&lt;p>Our open collaboration is truly open: new &lt;a href="https://reprex-next.netlify.app/authors/curator/">data curators&lt;/a>, data scientists and data engineers are welcome to join. We develop open-source software in an agile way, so you can join in with an intermediate programming skill to build unit tests or add new functionality, and if you are a beginner, you can start with documentation and testing our tutorials. For business, policy, and scientific data analysts, we provide unexploited, exciting new datasets. Advanced developers can &lt;a href="https://reprex-next.netlify.app/authors/developer/">join&lt;/a> our development team: the statistical data creation is mainly made in the R language, and the service infrastructure in Python and Go components.&lt;/p></description></item><item><title>There are Numerous Advantages of Switching from a National Level of the Analysis to a Sub National Level</title><link>https://reprex-next.netlify.app/post/2021-06-16-regions-release/</link><pubDate>Wed, 16 Jun 2021 12:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-16-regions-release/</guid><description>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/package_screenshots/regions_017_169.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;p>The new version of our &lt;a href="https://ropengov.org/" target="_blank" rel="noopener">rOpenGov&lt;/a> R package
&lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions&lt;/a> was released today on
CRAN. This package is one of the engines of our experimental open
data-as-service &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data
Observatory&lt;/a> , &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data
Observatory&lt;/a> , &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music
Observatory&lt;/a> prototypes, which aim to
place open data packages into open-source applications.&lt;/p>
&lt;p>In international comparison the use of nationally aggregated indicators
often have many disadvantages: they inhibit very different levels of
homogeneity, and data is often very limited in number of observations
for a cross-sectional analysis. When comparing European countries, a few
missing cases can limit the cross-section of countries to around 20
cases which disallows the use of many analytical methods. Working with
sub-national statistics has many advantages: the similarity of the
aggregation level and high number of observations can allow more precise
control of model parameters and errors, and the number of observations
grows from 20 to 200-300.&lt;/p>
&lt;figure id="figure-the-change-from-national-to-sub-national-level-comes-with-a-huge-data-processing-price-internal-administrative-boundaries-their-names-codes-codes-change-very-frequently">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="The change from national to sub-national level comes with a huge data processing price: internal administrative boundaries, their names, codes codes change very frequently." srcset="
/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_df043b13fb62aa7b45aa15fad51f4229.webp 400w,
/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_09a0d6124e334c5f1727420a059512a9.webp 760w,
/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_df043b13fb62aa7b45aa15fad51f4229.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
The change from national to sub-national level comes with a huge data processing price: internal administrative boundaries, their names, codes codes change very frequently.
&lt;/figcaption>&lt;/figure>
&lt;p>Yet the change from national to sub-national level comes with a huge
data processing price. While national boundaries are relatively stable,
with only a handful of changes in each recent decade. The change of
national boundaries requires a more-or-less global consensus. But states
are free to change their internal administrative boundaries, and they do
it with large frequency. This means that the names, identification codes
and boundary definitions of sub-national regions change very frequently.
Joining data from different sources and different years can be very
difficult.&lt;/p>
&lt;figure id="figure-our-regions-r-packagehttpsregionsdataobservatoryeu-helps-the-data-processing-validation-and-imputation-of-sub-national-regional-datasets-and-their-coding">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our [regions R package](https://regions.dataobservatory.eu/) helps the data processing, validation and imputation of sub-national, regional datasets and their coding." srcset="
/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_65df57cf4311bb2623535a1a5be044c0.webp 400w,
/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_81a53fd42fac7f0c3fe4e1a89d5b7892.webp 760w,
/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_65df57cf4311bb2623535a1a5be044c0.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our &lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions R package&lt;/a> helps the data processing, validation and imputation of sub-national, regional datasets and their coding.
&lt;/figcaption>&lt;/figure>
&lt;p>There are numerous advantages of switching from a national level of the
analysis to a sub-national level comes with a huge price in data
processing, validation and imputation, and the
&lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions&lt;/a> package aims to help this
process.&lt;/p>
&lt;p>You can review the problem, and the code that created the two map
comparisons, in the &lt;a href="https://regions.dataobservatory.eu/articles/maping.html" target="_blank" rel="noopener">Maping Regional Data, Maping Metadata
Problems&lt;/a>
vignette article of the package. A more detailed problem description can
be found in &lt;a href="https://regions.dataobservatory.eu/articles/Regional_stats.html" target="_blank" rel="noopener">Working With Regional, Sub-National Statistical
Products&lt;/a>.&lt;/p>
&lt;p>This package is an offspring of the
&lt;a href="https://ropengov.github.io/eurostat/" target="_blank" rel="noopener">eurostat&lt;/a> package on
&lt;a href="https://ropengov.github.io/" target="_blank" rel="noopener">rOpenGov&lt;/a>. It started as a tool to
validate and re-code regional Eurostat statistics, but it aims to be a
general solution for all sub-national statistics. It will be developed
parallel with other rOpenGov packages.&lt;/p>
&lt;h2 id="get-the-package">Get the Package&lt;/h2>
&lt;p>You can install the development version from
&lt;a href="https://github.com/" target="_blank" rel="noopener">GitHub&lt;/a> with:&lt;/p>
&lt;pre>&lt;code>devtools::install_github(&amp;quot;rOpenGov/regions&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>or the released version from CRAN:&lt;/p>
&lt;pre>&lt;code>install.packages(&amp;quot;regions&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>You can review the complete package documentation on
&lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions.dataobservaotry.eu&lt;/a>. If
you find any problems with the code, please raise an issue on
&lt;a href="https://github.com/rOpenGov/regions" target="_blank" rel="noopener">Github&lt;/a>. Pull requests are welcome
if you agree with the &lt;a href="https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html" target="_blank" rel="noopener">Contributor Code of
Conduct&lt;/a>&lt;/p>
&lt;p>If you use &lt;code>regions&lt;/code> in your work, please cite the
package as:
Daniel Antal, Kasia Kulma, Istvan Zsoldos, &amp;amp; Leo Lahti. (2021, June 16). regions (Version 0.1.7). CRAN. &lt;a href="%28https://doi.org/10.5281/zenodo.4965909%29">http://doi.org/10.5281/zenodo.4965909&lt;/a>&lt;/p>
&lt;p>&lt;a href="https://cran.r-project.org/package=regions" target="_blank" rel="noopener">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.r-pkg.org/badges/version/regions" alt="CRAN_Status_Badge" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/a>&lt;/p>
&lt;h2 id="join-us">Join us&lt;/h2>
&lt;p>&lt;em>Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>. More interested in environmental impact analysis? Try our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href="https://music.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> team!&lt;/em>&lt;/p>
&lt;p>&lt;a href="https://twitter.com/intent/follow?screen_name=EconDataObs" target="_blank" rel="noopener">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://img.shields.io/twitter/follow/EconDataObs.svg?style=social" alt="Follow GreenDealObs" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/a>&lt;/p></description></item><item><title>Open Data is Like Gold in the Mud Below the Chilly Waves of Mountain Rivers</title><link>https://reprex-next.netlify.app/post/2021-06-10-founder-daniel-antal/</link><pubDate>Thu, 10 Jun 2021 07:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-10-founder-daniel-antal/</guid><description>
&lt;figure id="figure-open-data-is-like-gold-in-the-mud-below-the-chilly-waves-of-mountain-rivers-panning-it-out-requires-a-lot-of-patience-or-a-good-machine">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine." srcset="
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp 400w,
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_faa00e96d3d0b700cfcf1daa513f3ad2.webp 760w,
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>As the founder of the automated data observatories that are part of Reprex’s core activities, what type of data do you usually use in your day-to-day work?&lt;/strong>&lt;/p>
&lt;p>The automated data observatories are results of syndicated research, data pooling, and other creative solutions to the problem of missing or hard-to-find data. The music industry is a very fragmented industry, where market research budgets and data are scattered in tens of thousands of small organizations in Europe. Working for the music and film industry as a data analyst and economist was always a pain because most of the efforts went into trying to find any data that can be analyzed. I spent most of the last 7-8 years trying to find any sort of information—from satellites to government archives—that could be formed into actionable data. I see three big sources of information: textual,numeric, and continuous recordings for on-site, offsite, and satellite sensors. I am much better with numbers than with natural language processing, and I am &lt;a href="https://greendeal.dataobservatory.eu/post/2021-06-06-tutorial-cds/" target="_blank" rel="noopener">improving with sensory sources&lt;/a>. But technically, I can mint any systematic information—the text of an old book, a satellite image, or an opinion poll—into datasets.&lt;/p>
&lt;p>&lt;strong>For you, what would be the ultimate dataset, or datasets that you would like to see in the Economy Data Observatory?&lt;/strong>&lt;/p>
&lt;p>I am a data scientist now, but I used to be a regulatory economist, and I have worked a lot with competition policy and monopoly regulation issues. Our observatories can automatically monitor market and environmental processes, which would allow us to get into computational antitrust. Peter Ormosi, our competition curator, is particularly &lt;a href="https://economy.dataobservatory.eu/post/2021-06-02-data-curator-peter-ormosi/" target="_blank" rel="noopener">interested in&lt;/a> killer acquisitions: approved mergers of big companies that end up piling up patents that are not used. I am more interested in describing systematically which markets are getting more concentrated and more competitive, in real time. Does data concentration coincide with market concentration?&lt;/p>
&lt;p>To bring an example from the realm of our &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>, which was a prototype to this one, I have been working for some time on creating streaming volume and price indexes, like the &lt;em>Dow Jones Industrial Average&lt;/em> or the various bond market indexes, that talk more about price, demand, and potential revenue in music streaming markets all over the world. We did a first take on this in the &lt;a href="https://ceereport2020.ceemid.eu/" target="_blank" rel="noopener">Central European Music Industry Report&lt;/a> and recently we iterated on the model for the &lt;em>UK Intellectual Property Office&lt;/em> and the &lt;em>UK Music Creators’ Earnings&lt;/em> project. We want to take this further to create a pan-Europe streaming market index, and we will be probably the first to actually be able to report on music market concentrations, and in fact, more or less in a real-time mode.&lt;/p>
&lt;figure id="figure-we-would-like-to-further-developer-our-20-country-streaming-indexeshttpsceereport2020ceemideumarkethtmlceemid-ci-volume-indexes-into-a-global-music-market-index">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="We would like to further developer our 20-country [streaming indexes]((https://ceereport2020.ceemid.eu/market.html#ceemid-ci-volume-indexes)) into a global music market index." srcset="
/media/img/blogposts_2021/medianvalue-1_hu5941f179e15628adbbb6d4dc0db86cd1_46382_59d954e926db1ce3ce9376aac454a3aa.webp 400w,
/media/img/blogposts_2021/medianvalue-1_hu5941f179e15628adbbb6d4dc0db86cd1_46382_75d58bfbbfae9d25c5551030d6d4206a.webp 760w,
/media/img/blogposts_2021/medianvalue-1_hu5941f179e15628adbbb6d4dc0db86cd1_46382_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/medianvalue-1_hu5941f179e15628adbbb6d4dc0db86cd1_46382_59d954e926db1ce3ce9376aac454a3aa.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
We would like to further developer our 20-country &lt;a href="%28https://ceereport2020.ceemid.eu/market.html#ceemid-ci-volume-indexes%29">streaming indexes&lt;/a> into a global music market index.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>Is there a number or piece of information that recently surprised you? If so, what was it?&lt;/strong>&lt;/p>
&lt;p>There were a few numbers that surprised me, and some of them were brought up by our observatory teams. Karel is &lt;a href="post/2021-06-08-data-curator-karel-volckaert/">talking&lt;/a> about the fact that not all green energy is green at all: many hydropower stations contribute to the greenhouse effect and not reduce it. Annette brought up the growing interest in the &lt;a href="https://reprex-next.netlify.app/post/2021-06-09-team-annette-wong/">Dalmatian breed&lt;/a> after the Disney &lt;em>101 Dalmatians&lt;/em> movies, and it reminded me of the astonishing growth in interest for chess sets, chess tutorials, and platform subscriptions after the success of Netflix’s &lt;em>The Queen’s Gambit&lt;/em>.&lt;/p>
&lt;figure id="figure-the-queens-gambit-chess-boom-moves-online-by-rachael-dottle-on-bloombergcomhttpswwwbloombergcomgraphics2020-chess-boom">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="*The Queen’s Gambit’ Chess Boom Moves Online By Rachael Dottle* on [bloomberg.com](https://www.bloomberg.com/graphics/2020-chess-boom/)" srcset="
/media/img/blogposts_2021/queens_gambit_bloomberg_hub50434a1789646b36daf41ad10e65b52_92708_4fc47acea402086dd3891772877289db.webp 400w,
/media/img/blogposts_2021/queens_gambit_bloomberg_hub50434a1789646b36daf41ad10e65b52_92708_b60a154be5ab781fb70d16f62f39966c.webp 760w,
/media/img/blogposts_2021/queens_gambit_bloomberg_hub50434a1789646b36daf41ad10e65b52_92708_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/queens_gambit_bloomberg_hub50434a1789646b36daf41ad10e65b52_92708_4fc47acea402086dd3891772877289db.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
&lt;em>The Queen’s Gambit’ Chess Boom Moves Online By Rachael Dottle&lt;/em> on &lt;a href="https://www.bloomberg.com/graphics/2020-chess-boom/" target="_blank" rel="noopener">bloomberg.com&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;p>Annette is talking about the importance of cultural influencers, and on that theme, what could be more exciting that &lt;a href="https://www.netflix.com/nl-en/title/80234304" target="_blank" rel="noopener">Netflix’s biggest success&lt;/a> so far is not a detective series or a soap opera but a coming-of-age story of a female chess prodigy. Intelligence is sexy, and we are in the intelligence business.&lt;/p>
&lt;p>But to tell a more serious and more sobering number, I recently read with surprise that there are &lt;a href="https://www.theguardian.com/society/2021/may/27/number-of-smokers-has-reached-all-time-high-of-11-billion-study-finds" target="_blank" rel="noopener">more people smoking cigarettes&lt;/a> on Earth in 2021 than in 1990. Population growth in developing countries replaced the shrinking number of developed country smokers. While I live in Europe, where smoking is strongly declining, it reminds me that Europe’s population is a small part of the world. We cannot take for granted that our home-grown experiences about the world are globally valid.&lt;/p>
&lt;p>&lt;strong>Do you have a good example of really good, or really bad use of data?&lt;/strong>&lt;/p>
&lt;p>&lt;a href="https://fivethirtyeight.com/" target="_blank" rel="noopener">FiveThirtyEight.com&lt;/a> had a wonderful podcast series, produced by Jody Avirgan, called &lt;em>What’s the Point&lt;/em>. It is exactly about good and bad uses of data, and each episode is super interesting. Maybe the most memorable is &lt;em>Why the Bronx Really Burned&lt;/em>. New York City tried to measure fire response times, identify redundancies in service, and close or re-allocate fire stations accordingly. What resulted, though, was a perfect storm of bad data: The methodology was flawed, the analysis was rife with biases, and the results were interpreted in a way that stacked the deck against poorer neighborhoods. It is similar to many stories told in a very compelling argument by Catherine D’Ignazio and Lauren F. Klein in their much celebrated book, &lt;em>Data Feminism&lt;/em>. Usually, the bad use of data starts with a bad data collection practice. Data analysts in corporations, NGOs, public policy organizations and even in science usually analyze the data that is available.&lt;/p>
&lt;p>&lt;em>You can find these examples, together with many more that our contributors recommend, in the motivating examples of &lt;a href="https://contributors.dataobservatory.eu/data-curators.html#create-new-datasets" target="_blank" rel="noopener">Create New Datasets&lt;/a> and the &lt;a href="https://contributors.dataobservatory.eu/data-curators.html#critical-attitude" target="_blank" rel="noopener">Remain Critical&lt;/a> parts of our onboarding material. We hope that more and more professionals and citizen scientist will help us to create high-quality and open data.&lt;/em>&lt;/p>
&lt;p>The real power lies in designing a data collection program. A consistent data collection program usually requires an investment that only powerful organizations, such as government agencies, very large corporations, or the richest universities can afford. You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling.&lt;/p>
&lt;figure id="figure-you-cannot-really-analyze-the-data-that-is-not-collected-and-recorded-and-usually-what-is-not-recorded-is-more-interesting-than-what-is-our-observatories-want-to-democratize-the-data-collection-process-and-make-it-more-available-more-shared-with-research-automation-and-pooling">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/slides/value_added_from_automation.png" alt="You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>From your perspective, what do you see being the greatest problem with open data in 2021?&lt;/strong>&lt;/p>
&lt;p>I have been involved with open data policies since 2004. The problem has not changed much: more and more data are available from governmental and scientific sources, but in a form that makes them useless. Data without clear description and clear processing information is useless for analytical purposes: it cannot be integrated with other data, and it cannot be trusted and verified. If researchers or government entities that fall under the &lt;a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2019.172.01.0056.01.ENG" target="_blank" rel="noopener">Open Data Directive&lt;/a> release data for reuse in a way that does not have descriptive or processing metadata, it is almost as if they did not release anything. You need this additional information to make valid analyses of the data, and to reverse-engineer them may cost more than to recollect the data in a properly documented process. Our developers, particularly &lt;a href="https://reprex-next.netlify.app/post/2021-06-04-developer-leo-lahti/">Leo&lt;/a> and &lt;a href="post/2021-06-07-data-curator-pyry-kantanen/">Pyry&lt;/a> are talking eloquently about why you have to be careful even with governmental statistical products, and constantly be on the watch out for data quality.&lt;/p>
&lt;figure id="figure-our-apidata-is-not-only-publishing-descriptive-and-processing-metadata-alongside-with-our-data-but-we-also-make-all-critical-elements-of-our-processing-code-available-for-peer-review-on-ropengovauthorsropengov">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/EDO_API_metadata_table.png" alt="Our [API](/#data) is not only publishing descriptive and processing metadata alongside with our data, but we also make all critical elements of our processing code available for peer-review on [rOpenGov](/authors/ropengov/)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our &lt;a href="https://reprex-next.netlify.app/#data">API&lt;/a> is not only publishing descriptive and processing metadata alongside with our data, but we also make all critical elements of our processing code available for peer-review on &lt;a href="https://reprex-next.netlify.app/authors/ropengov/">rOpenGov&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>What do you think the Economy Data Observatory, and our other automated observatories do, to make open data more credible in the European economic policy community and be accepted as verified information?&lt;/strong>&lt;/p>
&lt;p>Most of our work is in research automation, and a very large part of our efforts are aiming to reverse engineer missing descriptive and processing metadata. In a way, I like to compare ourselves to the working method of the open-source intelligence platform &lt;a href="https://www.bellingcat.com" target="_blank" rel="noopener">Bellingcat&lt;/a>. They were able to use publicly available, &lt;a href="https://www.bellingcat.com/category/resources/case-studies/?fwp_tags=mh17" target="_blank" rel="noopener">scattered information from satellites and social media&lt;/a> to identify each member of the Russian military company that illegally entered the territory of Ukraine and shot down the Malaysian Airways MH17 with 297, mainly Dutch, civilians on board.&lt;/p>
&lt;figure id="figure-how-we-create-value-for-research-oriented-consultancies-public-policy-institutes-university-research-teams-journalists-or-ngos">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/slides/automated_observatory_value_chain.jpg" alt="How we create value for research-oriented consultancies, public policy institutes, university research teams, journalists or NGOs." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
How we create value for research-oriented consultancies, public policy institutes, university research teams, journalists or NGOs.
&lt;/figcaption>&lt;/figure>
&lt;p>We do not do such investigations but work very similarly to them in how we are filtering through many data sources and attempting to verify them when their descriptions and processing history is unknown. In the last years, we were able to estore the metadata of many European and African open data surveys, economic impact, and environmental impact data, or many other open data that was lying around for many years without users.&lt;/p>
&lt;p>Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine. I think we will come to as surprising and strong findings as Bellingcat, but we are not focusing on individual events and stories, but on social and environmental processes and changes.&lt;/p>
&lt;figure id="figure-join-our-open-collaboration-economy-data-observatory-team-as-a-data-curatorauthorscurator-developerauthorsdeveloper-or-business-developerauthorsteam-or-share-your-data-in-our-public-repository-economy-data-observatory-on-zenodohttpszenodoorgcommunitieseconomy_observatory">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/edo_and_zenodo.png" alt="Join our open collaboration Economy Data Observatory team as a [data curator](/authors/curator), [developer](/authors/developer) or [business developer](/authors/team), or share your data in our public repository [Economy Data Observatory on Zenodo](https://zenodo.org/communities/economy_observatory/)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>, or share your data in our public repository &lt;a href="https://zenodo.org/communities/economy_observatory/" target="_blank" rel="noopener">Economy Data Observatory on Zenodo&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;h2 id="join-us">Join us&lt;/h2>
&lt;p>&lt;em>Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>. More interested in environmental impact analysis? Try our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href="https://music.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> team!&lt;/em>&lt;/p></description></item></channel></rss>