<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>open-science | Reprex</title><link>https://reprex-next.netlify.app/tag/open-science/</link><atom:link href="https://reprex-next.netlify.app/tag/open-science/index.xml" rel="self" type="application/rss+xml"/><description>open-science</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sun, 06 Feb 2022 14:00:00 +0000</lastBuildDate><image><url>https://reprex-next.netlify.app/media/icon_hub9491570ac57158c0eeecc95c95b13e5_20247_512x512_fill_lanczos_center_3.png</url><title>open-science</title><link>https://reprex-next.netlify.app/tag/open-science/</link></image><item><title>Open Policy Analysis</title><link>https://reprex-next.netlify.app/project/opa/</link><pubDate>Sun, 06 Feb 2022 14:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/project/opa/</guid><description>&lt;p>Our ambition is to truly maximize transparency, (re)usability, scientific, policy, and business impact while embracing the best practices laid out in the the recommendations of the &lt;em>Reproducibility of scientific results scoping report&lt;/em>, and the &lt;em>Progress on Open Science: Towards a Shared Research Knowledge System&lt;/em> policy documents of the European Commission&amp;rsquo;s DG Research &amp;amp; Innovation, as well as the best practices outlined in the evidence-based &lt;em>Knowledge4Policy&lt;/em> &lt;a href="https://knowledge4policy.ec.europa.eu/home_en" target="_blank" rel="noopener">K4P&lt;/a> platform of the European Commission.&lt;/p>
&lt;p>For the first time in Europe, we will apply and contextualize the &lt;a href="http://www.bitss.org/wp-content/uploads/2019/03/OPA-Guidelines.pdf" target="_blank" rel="noopener">Open Policy Analysis Guidelines&lt;/a> (OPA Guidelines) in our &lt;a href="https://reprex-next.netlify.app/projects/openmuse">OpenMuse&lt;/a> project.&lt;/p>
&lt;p>The &lt;code>Open Policy Analysis Guidelines&lt;/code> grew out of several initiatives in research transparency with the aim of maximizing benefits in the context of the &lt;a href="https://www.congress.gov/bill/115th-congress/house-bill/4174" target="_blank" rel="noopener">Foundations for Evidence-based Policy Making Act of 2018&lt;/a> initiative in the United States. We want to ensure that by relying not only on the best European practices, but considering trans-Atlantic experiences, we will make the most out of the opportunities offered by the European &lt;a href="%28https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32019L1024%29">Open Data Directive of 2019&lt;/a>. This will not only mean rendering a dramatically increased data availability for our partners, as well as increased quality assurance and transparency in our work, but also immediate data access.&lt;/p>
&lt;p>Our new software will continue to run in the cloud, depositing all of our findings&amp;mdash;&lt;em>Findable&lt;/em>, &lt;em>Accessible&lt;/em>, &lt;em>Interoperable&lt;/em> and &lt;em>Reuseable&lt;/em> digital assets, including our well-designed and user-tested indicators in 41 data gap fields&amp;mdash;into our &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>, which already hosts a &lt;a href="https://api.music.dataobservatory.eu/" target="_blank" rel="noopener">modern REST API&lt;/a> similar to the Eurostat Rest API.&lt;/p>
&lt;table class="table table-hover table-condensed" style="margin-left: auto; margin-right: auto;">
&lt;thead>
&lt;tr>
&lt;th style="text-align:left;">
Layer
&lt;/th>
&lt;th style="text-align:left;">
Goal
&lt;/th>
&lt;th style="text-align:left;">
Target
&lt;/th>
&lt;th style="text-align:left;">
Example
&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left;">
Open Output
&lt;/td>
&lt;td style="text-align:left;">
Ensure unified output
&lt;/td>
&lt;td style="text-align:left;">
We comply with the level 3 requirements and we will create a showcase
how to do this best following EU open science recommendations.
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://zenodo.org/record/5917742#.YflAK-rMLIU" style=" ">See
our example.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left;">
Open Output
&lt;/td>
&lt;td style="text-align:left;">
Establish a clear link between input and output
&lt;/td>
&lt;td style="text-align:left;">
We will produce more than 100 outputs, some only as indicators, and
others in form of policy analysis, we will comply with level 1,2,3 as
necessary.
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://music.dataobservatory.eu/publication/listen_local_2020/" style=" ">Our
affiliated music industry partners will create cases studies with
interactive tools (level 3). See our Slovak case study which came with a
Shiny App that analyzed music recommendations.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left;">
Open Analysis
&lt;/td>
&lt;td style="text-align:left;">
Provide clear accounts of all methodological procedures in a way that is
easily interpreted by an informed reader.
&lt;/td>
&lt;td style="text-align:left;">
We accomplish level 3 with placing the code in clearly documented. into
a dynamic document, or open notebook
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://music.dataobservatory.eu/post/2021-11-06-indicator_value_added/" style=" ">See
for example our blogpost on automatic forecasting for the music
industry.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left;">
Open Analysis
&lt;/td>
&lt;td style="text-align:left;">
Share raw (or analytic) data and materials in a way that the analysis is
reproducible with minimal effort.
&lt;/td>
&lt;td style="text-align:left;">
We will accomplish level 3 through trusted repositories following EU
recommendations. We will use the Zenodo repository developed by CERN and
the EU’s OpenAIRE project.
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://zenodo.org/communities/music_observatory/" style=" ">See
our solution on Zenodo.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left;">
Open Analysis
&lt;/td>
&lt;td style="text-align:left;">
Share an open report that includes clear accounts of all methodological
procedures, data, and assumptions.
&lt;/td>
&lt;td style="text-align:left;">
We would like to go beyond the level 3 requirements of the OPA with
using standardized documentation languages, such as SDMX statistical
metadata and its standardized codebooks, and comply with both Dublin
Core and DataCite extended, recommended standarized reporing.
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://music.dataobservatory.eu/publication/mce_empirical_streaming_2021/" style=" ">See
our example An Empirical Analysis of Music Streaming Revenues and Their
Distribution created for the UK Intellectual Property Office’s
evidence-based policy effort in music streaming.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left;">
Open Materials
&lt;/td>
&lt;td style="text-align:left;">
Standardize the file structure so that materials are organized in a way
that is accessible to an informed reader.
&lt;/td>
&lt;td style="text-align:left;">
We comply with the level 3 requirements. Our versioned controled output
is on Github.
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://github.com/dataobservatory-eu/music-competition" style=" ">See
an example on Github.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left;">
Open Materials
&lt;/td>
&lt;td style="text-align:left;">
Label and document each input, including data, research, and guesswork.
&lt;/td>
&lt;td style="text-align:left;">
We will go beyond level 3 requirements, because we want to make sure
that our labelling and documentation is interopreable, and we apply
various metadata standards for this purpose.
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://music.dataobservatory.eu/post/2021-11-08-indicator_findable/" style=" ">See
our example explaining how we document our datasets in our API.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left;">
Open Materials
&lt;/td>
&lt;td style="text-align:left;">
Ensure that code/spreadsheets are reproducible.
&lt;/td>
&lt;td style="text-align:left;">
All our spreadsheets are machine generated for the convenience of the
user who uses spreadsheet applications, but everything can be run with a
click, which accomplishes level 3, and maintains the convenience of
level 1-2 for the user. We go further with creating authoritative copies
of each dataset and visualization with DOIs. We also produce an API
which gives programatic or single table access to both the data and
standardized codebooks.
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://api.music.dataobservatory.eu/" style=" ">See our
API. All our datasets are described in detail on Zenodo and Figshare,
too.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left;">
Open Materials
&lt;/td>
&lt;td style="text-align:left;">
Use a version control strategy.
&lt;/td>
&lt;td style="text-align:left;">
We use Git version control, and we employ various repositories and
project documentation tools on Github. These are linked with the Zenodo
EU open repository and our data API.
&lt;/td>
&lt;td style="text-align:left;">
&lt;a href="https://retroharmonize.dataobservatory.eu/articles/cap.html" style=" ">See
our example intergration.&lt;/a>
&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table></description></item><item><title>Research &amp; Analysis: Music Creators’ Earnings in the Digital Era</title><link>https://reprex-next.netlify.app/post/2021-09-23-mce_reports/</link><pubDate>Thu, 23 Sep 2021 08:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-09-23-mce_reports/</guid><description>&lt;p>Reprex with its &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory team&lt;/a> was commissioned to prepare an analysis on the justified and not justified differences in music creators’ earnings. We have posted our most important findings in an earlier blogpost (&lt;a href="https://music.dataobservatory.eu/post/2021-06-18-mce/" target="_blank" rel="noopener">Music Creators’ Earnings in the Streaming Era. United Kingdom Research Cooperation With the Digital Music Observatory&lt;/a>.&lt;/p>
&lt;p>The UK Intellectual Property Office has published the entire report on the music creators’ earnings, and we have made our detailed analysis available in a side-publication. Reprex also signed an agreement with the researchers of the Music Creators’ Earnings project to deposit all data published in the report in the Digital Music Observatory, and to promote the building of the observatory further.&lt;/p>
&lt;p>The research questions asked in this report are related to the &lt;a href="https://www.gov.uk/government/publications/music-creators-earnings-in-the-digital-era" target="_blank" rel="noopener">Music Creator Earnings&amp;rsquo; Project&lt;/a> (MCE), exploring issues concerning equitable remuneration and earnings distributions. We were tasked with providing a longitudinal analysis of earnings development and relating our findings to equitable remuneration. The starting point of our work was centred around a very broadly defined problem: how much money music creators (rightsholders) earn from streaming, how these earnings are distributed, and how the earnings and their distribution have developed during the last decade.&lt;/p>
&lt;p>The highly globalized music industry generates two important international reports, as well as several national reports, but these are not suitable for the analysis of the typical or average rightsholder, nor for small labels and publishers who do not represent a large and internationally diversified portfolio of music works or recordings. Copyright and neighboring right revenues are collected in national jurisdictions. Because British artists are almost never constrained by their use of language, and the UK Music Industry is highly competitive in the global music markets, even relatively less known rightsholders earn revenues from dozens of national markets. The lack of market information on music sales volumes, prices for each jurisdiction, and the unaccounted for national, domestic, and foreign revenues makes the analysis of the rightholder’s earnings, or the economics of a certain distribution channel like music streaming or media platforms, impossible.&lt;/p>
&lt;figure id="figure-the-effect-of-international-diversification-on-revenues---a-combination-of-international-price-differences-and-exchange-rate-fluctuations">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/reports/mce/Effect_International_Diversification_Revenues_Coplot.png" alt="The Effect of International Diversification on Revenues - a combination of international price differences and exchange rate fluctuations." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
The Effect of International Diversification on Revenues - a combination of international price differences and exchange rate fluctuations.
&lt;/figcaption>&lt;/figure>
&lt;p>While total earnings are reported by international and national organizations, they hide five important economic variables: changes in sales volumes, changes in prices, market share on various national jurisdictions (which have their own volume and price movements), the exchange rates applied, and the share of the repertoire exploited. Even worse, the global music industry has no comprehensive database of rightsholders, music works, and recordings – this is the data gap that we would like fill with the Digital Music Observatory.&lt;/p>
&lt;p>Our &lt;a href="https://mce.dataobservatory.eu/" target="_blank" rel="noopener">report&lt;/a> highlights some important lessons. First, we show that in the era of global music sales platforms it is impossible to understand the economics of music streaming without international data harmonization and advanced surveying and sampling. Paradoxically, without careful adjustments for accruals, market shares in jurisdictions, and disaggregation of price and volume changes, the British industry cannot analyze its own economics because of its high level of integration to the global music economy. Furthermore, the replacement of former public performances, mechanical licensing, and private copying remunerations (which has been available for British rightsholders in their European markets for decades) with less valuable streaming licenses has left many rightsholders poorer. Making adjustments on the distribution system without modifying the definition of equitable remuneration rights or the pro-rata distribution scheme of streaming platforms opens up many conflicts while solving not enough fundamental problems. Therefore, we suggest participation in international data harmonization and policy coordination to help regain the historical value of music.&lt;/p>
&lt;h2 id="context">Context&lt;/h2>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/blogposts_20121/dcms_economics_music_streaming.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;p>The idea of our Digital Music Observatory was brought to the UK policy debate on music streaming by the &lt;em>Written evidence submitted by The state51 Music Group&lt;/em> to the &lt;em>Economics of music streaming review&lt;/em> of the UK Parliaments&amp;rsquo; DCMS Committee&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>.&lt;/p>
&lt;p>The music industry requires a permanent market monitoring facility to win fights in competition tribunals, because it is increasingly disputing revenues with the world’s biggest data owners. This was precisely the role of the former CEEMID&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup> program, which was initiated by a group of collective management societies. Starting with three relatively data-poor countries, where data pooling allowed rightsholders to increase revenues, the CEEMID data collection program was extended in 2019 to 12 countries.The &lt;a href="https://ceereport2020.ceemid.eu/" target="_blank" rel="noopener">final regional report&lt;/a>, after the release of the detailed &lt;a href="https://music.dataobservatory.eu/publication/hungary_music_industry_2014/" target="_blank" rel="noopener">Hungarian&lt;/a>, &lt;a href="https://music.dataobservatory.eu/publication/slovak_music_industry_2019/" target="_blank" rel="noopener">Slovak&lt;/a> and &lt;a href="https://music.dataobservatory.eu/publication/private_copying_croatia_2019/" target="_blank" rel="noopener">Croatian reports&lt;/a> of CEEMID was sponsored by Consolidated Independent (of the &lt;em>state51 music group&lt;/em>.)&lt;/p>
&lt;p>CEEMID was eventually to formed into the &lt;em>Demo Music Observatory&lt;/em> in 2020&lt;sup id="fnref:3">&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref">3&lt;/a>&lt;/sup>, following the planned structure of the &lt;a href="https://dataandlyrics.com/post/2020-11-16-european-music-observatory-feasibility/" target="_blank" rel="noopener">European Music Observatory&lt;/a>, and validated in the world&amp;rsquo;s 2nd ranked university-backed incubator, the Yes!Delft AI+Blockchain Validation Lab. In 2021, under the final name &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>, it became open for any rightsholder or stakeholder organization or music research institute, and it is being launched with the help of the &lt;a href="https://dataandlyrics.com/post/2021-03-04-jump-2021/" target="_blank" rel="noopener">JUMP European Music Market Accelerator Programme&lt;/a> which is co-funded by the Creative Europe Programme of the European Union.&lt;/p>
&lt;p>In December 2020, we started investigating how the music observatory concept could be introduced in the UK, and how our data and analytical skills could be used in the &lt;a href="https://digit-research.org/research/related-projects/music-creators-earnings-in-the-streaming-era/" target="_blank" rel="noopener">Music Creators’ Earnings in the Streaming Era&lt;/a> (in short: MCE) project, which is taking place paralell to the heated political debates around the DCMS inquiry. After the &lt;em>state51 music group&lt;/em> gave permission for the UK Intellectual Property Office to reuse the data that was originally published as the experimental &lt;a href="https://ceereport2020.ceemid.eu/market.html#recmarket" target="_blank" rel="noopener">CEEMID-CI Streaming Volume and Revenue Indexes&lt;/a>, we came to a cooperation agreement between the MCE Project and the &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>. We provided a detailed historical analysis and computer simulation for the MCE Project, and we will host all the data of the &lt;em>Music Creators’ Earnings Report&lt;/em> in our observatory, hopefully no later than early July 2021.&lt;/p>
&lt;figure id="figure-the-digital-music-observatoryhttpsmusicdataobservatoryeu-contributes-to-the-music-creators-earnings-in-the-streaming-era-project-with-understanding-the-level-of-justified-and-unjustified-differences-in-rightsholder-earnings-and-putting-them-into-a-broader-music-economy-context">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/observatory_screenshots/dmo_opening_screen.png" alt="The [Digital Music Observatory](https://music.dataobservatory.eu/) contributes to the Music Creators’ Earnings in the Streaming Era project with understanding the level of justified and unjustified differences in rightsholder earnings, and putting them into a broader music economy context." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
The &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> contributes to the Music Creators’ Earnings in the Streaming Era project with understanding the level of justified and unjustified differences in rightsholder earnings, and putting them into a broader music economy context.
&lt;/figcaption>&lt;/figure>
&lt;p>We started our cooperation with the two principal investigators of the project, &lt;a href="https://music.dataobservatory.eu/author/prof-david-hesmondhalgh/" target="_blank" rel="noopener">Prof David Hesmondhalgh&lt;/a> and &lt;a href="https://music.dataobservatory.eu/author/hyojung-sun/" target="_blank" rel="noopener">Dr Hyojugn Sun&lt;/a> back in April and will start releasing the findings and the data in July 2021.&lt;/p>
&lt;h2 id="join-us">Join us&lt;/h2>
&lt;p>&lt;em>Do you need high-quality data for your music business or institution? Are you a music researcher? Join our open collaboration Digital Music Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>.&lt;/em>&lt;/p>
&lt;h2 id="footnote-references">Footnote References&lt;/h2>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>state51 Music Group. 2020. “Written Evidence Submitted by The state51 Music Group. Economics of Music Streaming Review. Response to Call for Evidence.” UK Parliament website. &lt;a href="https://committees.parliament.uk/writtenevidence/15422/html/" target="_blank" rel="noopener">https://committees.parliament.uk/writtenevidence/15422/html/&lt;/a>.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>Artisjus, HDS, SOZA, and Candole Partners. 2014. “Measuring and Reporting Regional Economic Value Added, National Income and Employment by the Music Industry in a Creative Industries Perspective. Memorandum of Understanding to Create a Regional Music Database to Support Professional National Reporting, Economic Valuation and a Regional Music Study.”&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:3" role="doc-endnote">
&lt;p>Antal, Daniel. 2021. “Launching Our Demo Music Observatory.” &lt;em>Data &amp;amp; Lyrics&lt;/em>. Reprex. &lt;a href="https://dataandlyrics.com/post/2020-09-15-music-observatory-launch/" target="_blank" rel="noopener">https://dataandlyrics.com/post/2020-09-15-music-observatory-launch/&lt;/a>.&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>Including Indicators from Arab Barometer in Our Observatory</title><link>https://reprex-next.netlify.app/post/2021-06-28-arabbarometer/</link><pubDate>Mon, 28 Jun 2021 09:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-28-arabbarometer/</guid><description>&lt;p>&lt;em>A new version of the retroharmonize R package – which is working with retrospective, ex post harmonization of survey data – was released yesterday after peer-review on CRAN. It allows us to compare opinion polling data from the Arab Barometer with the Eurobarometer and Afrorbarometer. This is the first version that is released in the rOpenGov community, a community of R package developers on open government data analytics and related topics.&lt;/em>&lt;/p>
&lt;p>Surveys are the most important data sources in social and economic
statistics – they ask people about their lives, their attitudes and
self-reported actions, or record data from companies and NGOs. Survey
harmonization makes survey data comparable across time and countries. It
is very important, because often we do not know without comparison if an
indicator value is &lt;em>low&lt;/em> or &lt;em>high&lt;/em>. If 40% of the people think that
&lt;em>climate change is a very serious problem&lt;/em>, it does not really tell us
much without knowing what percentage of the people answered this
question similarly a year ago, or in other parts of the world.&lt;/p>
&lt;p>With the help of Ahmed Shabani and Yousef Ibrahim, we created a third
case study after the
&lt;a href="https://retroharmonize.dataobservatory.eu/articles/eurobarometer.html" target="_blank" rel="noopener">Eurobarometer&lt;/a>,
and
&lt;a href="https://retroharmonize.dataobservatory.eu/articles/afrobarometer.html" target="_blank" rel="noopener">Afrobarometer&lt;/a>,
about working with the &lt;a href="https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html" target="_blank" rel="noopener">Arab
Barometer&lt;/a>
harmonized survey data files.&lt;/p>
&lt;p>&lt;em>Ex ante&lt;/em> survey harmonization means that researchers design
questionnaires that are asking the same questions with the same survey
methodology in repeated, distinct times (waves), or across different
countries with carefully harmonized question translations. &lt;em>Ex post&lt;/em>
harmonizations means that the resulting data has the same variable
names, same variable coding, and can be joined into a tidy data frame
for joint statistical analysis. While seemingly a simple task, it
involves plenty of metadata adjustments, because established survey
programs like Eurobarometer, Afrobarometer or Arab Barometer have
several decades of history, and several decades of coding practices and
file formatting legacy.&lt;/p>
&lt;ul>
&lt;li>&lt;em>Variable harmonization&lt;/em> means that if the same question is called
in one microdata source &lt;code>Q108&lt;/code> and the other &lt;code>eval-parl-elections&lt;/code>
then we make sure that they get a harmonize and machine readable
name without spaces and special characters.&lt;/li>
&lt;li>&lt;em>Variable label harmonization&lt;/em> means that the same questionnaire
items get the same numeric coding and same categorical labels.&lt;/li>
&lt;li>&lt;em>Missing case harmonization&lt;/em> means that various forms of missingness
are treated the same way.&lt;/li>
&lt;/ul>
&lt;figure id="figure-for-the-evaluation-of-the-economic-situation-dataset-get-the-country-averages-and-aggregates-from-zenodohttpdoiorg105281zenodo5036432-and-the-plot-in-jpg-or-png-from-figsharehttpsfigsharecomarticlesfigurearab_barometer_5_econ_eval_by_country_png14865498">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/blogposts_2021/arab_barometer_5_evon_eval_by_country.png" alt="For the evaluation of the economic situation dataset, get the country averages and aggregates from [Zenodo](http://doi.org/10.5281/zenodo.5036432), and the plot in `jpg` or `png` from [figshare](https://figshare.com/articles/figure/arab_barometer_5_econ_eval_by_country_png/14865498)." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
For the evaluation of the economic situation dataset, get the country averages and aggregates from &lt;a href="http://doi.org/10.5281/zenodo.5036432" target="_blank" rel="noopener">Zenodo&lt;/a>, and the plot in &lt;code>jpg&lt;/code> or &lt;code>png&lt;/code> from &lt;a href="https://figshare.com/articles/figure/arab_barometer_5_econ_eval_by_country_png/14865498" target="_blank" rel="noopener">figshare&lt;/a>.
&lt;/figcaption>&lt;/figure>
&lt;p>In our new &lt;a href="https://retroharmonize.dataobservatory.eu/articles/arabbarometer.html" target="_blank" rel="noopener">Arab Barometer case
study&lt;/a>,
the evaulation of parliamentary elections has the following labels. We
code them consistently &lt;code>1: free_and_fair&lt;/code>, &lt;code>2: some_minor_problems&lt;/code>,
&lt;code>3: some_major_problems&lt;/code> and &lt;code>4: not_free&lt;/code>.&lt;/p>
&lt;table>
&lt;colgroup>
&lt;col style="width: 50%" />
&lt;col style="width: 50%" />
&lt;/colgroup>
&lt;tbody>
&lt;tr class="odd">
&lt;td style="text-align: left;">“0. missing”&lt;/td>
&lt;td style="text-align: left;">“1. they were completely free and fair”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“2. they were free and fair, with some minor problems”&lt;/td>
&lt;td style="text-align: left;">“3. they were free and fair, with some major problems”&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td style="text-align: left;">“4. they were not free and fair”&lt;/td>
&lt;td style="text-align: left;">“8. i don’t know”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“9. declined to answer”&lt;/td>
&lt;td style="text-align: left;">“Missing”&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td style="text-align: left;">“They were completely free and fair”&lt;/td>
&lt;td style="text-align: left;">“They were free and fair, with some minor breaches”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“They were free and fair, with some major breaches”&lt;/td>
&lt;td style="text-align: left;">“They were not free and fair”&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td style="text-align: left;">“Don’t know”&lt;/td>
&lt;td style="text-align: left;">“Refuse”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“Completely free and fair”&lt;/td>
&lt;td style="text-align: left;">“Free and fair, but with minor problems”&lt;/td>
&lt;/tr>
&lt;tr class="odd">
&lt;td style="text-align: left;">“Free and fair, with major problems”&lt;/td>
&lt;td style="text-align: left;">“Not free or fair”&lt;/td>
&lt;/tr>
&lt;tr class="even">
&lt;td style="text-align: left;">“Don’t know (Do not read)”&lt;/td>
&lt;td style="text-align: left;">“Decline to answer (Do not read)”&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Of course, this harmonization is essential to get clean results like this:&lt;/p>
&lt;figure id="figure-for-evaluation-or-reuse-of-parliamentary-elections-dataset-get-the-replication-data-and-the-code-from-the-zenodohhttpsdoiorg105281zenodo5034759-open-repository">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="For evaluation or reuse of parliamentary elections dataset get the replication data and the code from the [Zenodo](hhttps://doi.org/10.5281/zenodo.5034759) open repository." srcset="
/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_30b9d9bccbe8f347c912dbe10ef5159c.webp 400w,
/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_f7e62366b8310160e9cdd16714a5ac44.webp 760w,
/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/arabb-comparison-country-chart_hu876e56138097bf35e9ab80c0a7351314_159521_30b9d9bccbe8f347c912dbe10ef5159c.webp"
width="506"
height="760"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
For evaluation or reuse of parliamentary elections dataset get the replication data and the code from the &lt;a href="hhttps://doi.org/10.5281/zenodo.5034759">Zenodo&lt;/a> open repository.
&lt;/figcaption>&lt;/figure>
&lt;p>In our case study, we had three forms of missingness: the respondent
&lt;em>did not know&lt;/em> the answer, the respondent &lt;em>did not want&lt;/em> to answer, and
at last, in some cases the &lt;em>respondent was not asked&lt;/em>, because the
country held no parliamentary elections. While in numerical processing,
all these answers must be left out from calculating averages, for
example, in a more detailed, categorical analysis they represent very
different cases. A high level of refusal to answer may be an indicator
of surpressing democratic opinion forming in itself.&lt;/p>
&lt;p>Survey harmonization with many countries entails tens of thousands of
small data management task, which, unless automatically documented,
logged, and created with a reproducible code, is a helplessly
error-prone process. We believe that our open-source software will bring
many new statistical information to the light, which, while legally
open, was never processed due to the large investment needed.&lt;/p>
&lt;p>We also started building experimental APIs data is running
&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">retroharmonize&lt;/a> regularly.
We will place cultural access and participation data in the &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital
Music Observatory&lt;/a>, climate
awareness, policy support and self-reported mitigation strategies into
the &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data
Observatory&lt;/a>, and economy and
well-being data into our &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data
Observatory&lt;/a>.&lt;/p>
&lt;h2 id="further-plans">Further plans&lt;/h2>
&lt;p>Retrospective survey harmonization is a far more complex task than this
blogpost suggest. Retrospective survey harmonization is a far more complex task than this blogpost suggest, because established survey programs have gathered decades of legacy data in legacy coding schemes and legacy file formats. Putting the data right, and especially putting the invaluable descriptive and administrative (processing) metadata right is a huge undertaking. We are releasing example codes, datasets and charts for researchers to comapre our harmonized results with theirs, and improve our software. We are releasing example codes, datasets and charts
for researchers to comapre our harmonized results with theirs, and
improve our software.&lt;/p>
&lt;h3 id="use-our-software">Use our software&lt;/h3>
&lt;p>The &lt;code>retroharmonize&lt;/code> R package can be freely used, modified and
distributed under the GPL-3 license. For the main developer and
contributors, see the
&lt;a href="https://retroharmonize.dataobservatory.eu/" target="_blank" rel="noopener">package&lt;/a> homepage. If you
use it for your work, please kindly cite it as:&lt;/p>
&lt;p>Daniel Antal (2021). retroharmonize: Ex Post Survey Data Harmonization.
R package version 0.1.17. &lt;a href="https://doi.org/10.5281/zenodo.5034752" target="_blank" rel="noopener">https://doi.org/10.5281/zenodo.5034752&lt;/a>&lt;/p>
&lt;p>Download the &lt;a href="https://reprex-next.netlify.app/media/bibliography/cite-retroharmonize.bib" target="_blank">BibLaTeX entry&lt;/a>.&lt;/p>
&lt;h3 id="tutorial-to-work-with-the-arab-barometer-survey-data">Tutorial to work with the Arab Barometer survey data&lt;/h3>
&lt;p>Daniel Antal, &amp;amp; Ahmed Shaibani. (2021, June 26). Case Study: Working
With Arab Barometer Surveys for the retroharmonize R package (Version
0.1.6). Zenodo. &lt;a href="https://doi.org/10.5281/zenodo.5034759" target="_blank" rel="noopener">https://doi.org/10.5281/zenodo.5034759&lt;/a>&lt;/p>
&lt;p>For the replication data to report potential
&lt;a href="https://github.com/rOpenGov/retroharmonize/issues" target="_blank" rel="noopener">issues&lt;/a> and
improvement suggestions with the code:&lt;/p>
&lt;p>Daniel Antal, &amp;amp; Ahmed Shaibani. (2021). Replication Data for the
retroharmonize R Package Case Study: Working With Arab Barometer Surveys
(Version 0.1.6) [Data set]. Zenodo.
&lt;a href="https://doi.org/10.5281/zenodo.5034741" target="_blank" rel="noopener">https://doi.org/10.5281/zenodo.5034741&lt;/a>&lt;/p>
&lt;h3 id="experimental-api">Experimental API&lt;/h3>
&lt;p>We are also experimenting with the automated placement of authoritative
and citeable figures and datasets in open repositories. For the climate
awareness dataset get the country averages and aggregates from
&lt;a href="http://doi.org/10.5281/zenodo.5036432" target="_blank" rel="noopener">Zenodo&lt;/a>, and the plot in &lt;code>jpg&lt;/code>
or &lt;code>png&lt;/code> from &lt;a href="https://figshare.com/articles/figure/arab_barometer_5_econ_eval_by_country_png/14865498" target="_blank" rel="noopener">figshare&lt;/a>.
Our plan is to release open data in a modern API with rich descriptive
metadata meeting the &lt;em>Dublin Core&lt;/em> and &lt;em>DataCite&lt;/em> standards, and further
administrative metadata for correct coding, joining and further
manipulating or data, or for easy import into your database.&lt;/p>
&lt;h3 id="join-our-open-source-effort">Join our open source effort&lt;/h3>
&lt;p>Want to help us improve our open data service? Include
&lt;a href="https://www.latinobarometro.org/lat.jsp" target="_blank" rel="noopener">Lationbarómetro&lt;/a> and the
&lt;a href="https://caucasusbarometer.org/en/datasets/" target="_blank" rel="noopener">Caucasus Barometer&lt;/a> in our
offering? Join the rOpenGov community of R package developers, an our
open collaboration to create the automated data observatories. We are
not only looking for
&lt;a href="https://reprex-next.netlify.app/authors/developer/">developers&lt;/a>,
but &lt;a href="https://reprex-next.netlify.app/authors/curator/">data
curators&lt;/a> and
&lt;a href="https://reprex-next.netlify.app/authors/team/">service design
associates&lt;/a>, too.&lt;/p></description></item><item><title>Open Data - The New Gold Without the Rush</title><link>https://reprex-next.netlify.app/post/2021-06-18-gold-without-rush/</link><pubDate>Fri, 18 Jun 2021 17:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-18-gold-without-rush/</guid><description>&lt;p>&lt;em>If open data is the new gold, why even those who release fail to reuse it? We created an open collaboration of data curators and open-source developers to dig into novel open data sources and/or increase the usability of existing ones. We transform reproducible research software into research- as-service.&lt;/em>&lt;/p>
&lt;p>Every year, the EU announces that billions and billions of data are now “open” again, but this is not gold. At least not in the form of nicely minted gold coins, but in gold dust and nuggets found in the muddy banks of chilly rivers. There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.&lt;/p>
&lt;figure id="figure-there-is-no-rush-for-it-because-panning-out-its-value-requires-a-lot-of-hours-of-hard-work-our-goal-is-to-automate-this-work-to-make-open-data-usable-at-scale-even-in-trustworthy-ai-solutions">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions." srcset="
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp 400w,
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_faa00e96d3d0b700cfcf1daa513f3ad2.webp 760w,
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.
&lt;/figcaption>&lt;/figure>
&lt;p>Most open data is not public, it is not downloadable from the Internet – in the EU parlance, “open” only means a legal entitlement to get access to it. And even in the rare cases when data is open and public, often it is mired by data quality issues. We are working on the prototypes of a data-as-service and research-as-service built with open-source statistical software that taps into various and often neglected open data sources.&lt;/p>
&lt;p>We are in the prototype phase in June and our intentions are to have a well-functioning service by the time of the conference, because we are working only with open-source software elements; our technological readiness level is already very high. The novelty of our process is that we are trying to further develop and integrate a few open-source technology items into technologically and financially sustainable data-as-service and even research-as-service solutions.&lt;/p>
&lt;figure id="figure-our-review-of-about-80-eu-un-and-oecd-data-observatories-reveals-that-most-of-them-do-not-use-these-organizationss-open-data---instead-they-use-various-and-often-not-well-processed-proprietary-sources">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;#39;s open data - instead they use various, and often not well processed proprietary sources." srcset="
/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_0079ea9844f6c5e52b52fd0e627467a2.webp 400w,
/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_ecd6d08ba5e9bac19c8173546f036651.webp 760w,
/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/observatory_screenshots/observatory_collage_16x9_800_hu47f74f5cdae63c7248c2367b9d148671_353025_0079ea9844f6c5e52b52fd0e627467a2.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our review of about 80 EU, UN and OECD data observatories reveals that most of them do not use these organizations&amp;rsquo;s open data - instead they use various, and often not well processed proprietary sources.
&lt;/figcaption>&lt;/figure>
&lt;p>We are taking a new and modern approach to the &lt;code>data observatory&lt;/code> concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science. Various UN and OECD bodies, and particularly the European Union support or maintain more than 60 data observatories, or permanent data collection and dissemination points, but even these do not use these organizations and their members open data. We are building open-source data observatories, which run open-source statistical software that automatically processes and documents reusable public sector data (from public transport, meteorology, tax offices, taxpayer funded satellite systems, etc.) and reusable scientific data (from EU taxpayer funded research) into new, high quality statistical indicators.&lt;/p>
&lt;figure id="figure-we-are-taking-a-new-and-modern-approach-to-the-data-observatory-concept-and-modernizing-it-with-the-application-of-21st-century-data-and-metadata-standards-the-new-results-of-reproducible-research-and-data-science">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/slides/automated_observatory_value_chain.jpg" alt="We are taking a new and modern approach to the ‘data observatory’ concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
We are taking a new and modern approach to the ‘data observatory’ concept, and modernizing it with the application of 21st century data and metadata standards, the new results of reproducible research and data science
&lt;/figcaption>&lt;/figure>
&lt;ul>
&lt;li>We are building various open-source data collection tools in R and Python to bring up data from big data APIs and legally open, but not public, and not well served data sources. For example, we are working on capturing representative data from the Spotify API or creating harmonized datasets from the Eurobarometer and Afrobarometer survey programs.&lt;/li>
&lt;li>Open data is usually not public; whatever is legally accessible is usually not ready to use for commercial or scientific purposes. In Europe, almost all taxpayer funded data is legally open for reuse, but it is usually stored in heterogeneous formats, processed into an original government or scientific need, and with various and low documentation standards. Our expert data curators are looking for new data sources that should be (re-) processed and re-documented to be usable for a wider community. We would like to introduce our service flow, which touches upon many important aspects of data scientist, data engineer and data curatorial work.&lt;/li>
&lt;li>We believe that even such generally trusted data sources as Eurostat often need to be reprocessed, because various legal and political constraints do not allow the common European statistical services to provide optimal quality data – for example, on the regional and city levels.&lt;/li>
&lt;li>With &lt;a href="https://reprex-next.netlify.app/authors/ropengov/">rOpenGov&lt;/a> and other partners, we are creating open-source statistical software in R to re-process these heterogenous and low-quality data into tidy statistical indicators to automatically validate and document it.&lt;/li>
&lt;li>We are carefully documenting and releasing administrative, processing, and descriptive metadata, following international metadata standards, to make our data easy to find and easy to use for data analysts.&lt;/li>
&lt;li>We are automatically creating depositions and authoritative copies marked with an individual digital object identifier (DOI) to maintain data integrity.&lt;/li>
&lt;li>We are building simple databases and supporting APIs that release the data without restrictions, in a tidy format that is easy to join with other data, or easy to join into databases, together with standardized metadata.&lt;/li>
&lt;li>We maintain observatory websites (see: &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>, &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a>, &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data Observatory&lt;/a>) where not only the data is available, but we provide tutorials and use cases to make it easier to use them. Our mission is to show a modern, 21st century reimagination of the data observatory concept developed and supported by the UN, EU and OECD, and we want to show that modern reproducible research and open data could make the existing 60 data observatories and the planned new ones grow faster into data ecosystems.&lt;/li>
&lt;/ul>
&lt;p>We are working around the open collaboration concept, which is well-known in open source software development and reproducible science, but we try to make this agile project management methodology more inclusive, and include data curators, and various institutional partners into this approach. Based around our early-stage startup, Reprex, and the open-source developer community rOpenGov, we are working together with other developers, data scientists, and domain specific data experts in climate change and mitigation, antitrust and innovation policies, and various aspects of the music and film industry.&lt;/p>
&lt;figure id="figure-our-open-collaboration-is-truly-open-new-data-curatorsauthorscuratordevelopersauthorsdeveloper-and-service-designersauthorsteam-even-volunteers-and-citizen-scientists-are-welcome-to-join">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our open collaboration is truly open: new [data curators](/authors/curator/),[developers](/authors/developer/) and [service designers](/authors/team/), even volunteers and citizen scientists are welcome to join." srcset="
/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_a07a8e618fa7317f6f8256b9a334262e.webp 400w,
/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_3a4ae7f72478fd880961b08e1f7075dd.webp 760w,
/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/observatory_screenshots/dmo_contributors_hua4f41ef7327b64bb97f169af135070bd_140729_a07a8e618fa7317f6f8256b9a334262e.webp"
width="760"
height="427"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our open collaboration is truly open: new &lt;a href="https://reprex-next.netlify.app/authors/curator/">data curators&lt;/a>,&lt;a href="https://reprex-next.netlify.app/authors/developer/">developers&lt;/a> and &lt;a href="https://reprex-next.netlify.app/authors/team/">service designers&lt;/a>, even volunteers and citizen scientists are welcome to join.
&lt;/figcaption>&lt;/figure>
&lt;p>Our open collaboration is truly open: new &lt;a href="https://reprex-next.netlify.app/authors/curator/">data curators&lt;/a>, data scientists and data engineers are welcome to join. We develop open-source software in an agile way, so you can join in with an intermediate programming skill to build unit tests or add new functionality, and if you are a beginner, you can start with documentation and testing our tutorials. For business, policy, and scientific data analysts, we provide unexploited, exciting new datasets. Advanced developers can &lt;a href="https://reprex-next.netlify.app/authors/developer/">join&lt;/a> our development team: the statistical data creation is mainly made in the R language, and the service infrastructure in Python and Go components.&lt;/p></description></item><item><title>There are Numerous Advantages of Switching from a National Level of the Analysis to a Sub National Level</title><link>https://reprex-next.netlify.app/post/2021-06-16-regions-release/</link><pubDate>Wed, 16 Jun 2021 12:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-16-regions-release/</guid><description>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/package_screenshots/regions_017_169.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;p>The new version of our &lt;a href="https://ropengov.org/" target="_blank" rel="noopener">rOpenGov&lt;/a> R package
&lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions&lt;/a> was released today on
CRAN. This package is one of the engines of our experimental open
data-as-service &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data
Observatory&lt;/a> , &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data
Observatory&lt;/a> , &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music
Observatory&lt;/a> prototypes, which aim to
place open data packages into open-source applications.&lt;/p>
&lt;p>In international comparison the use of nationally aggregated indicators
often have many disadvantages: they inhibit very different levels of
homogeneity, and data is often very limited in number of observations
for a cross-sectional analysis. When comparing European countries, a few
missing cases can limit the cross-section of countries to around 20
cases which disallows the use of many analytical methods. Working with
sub-national statistics has many advantages: the similarity of the
aggregation level and high number of observations can allow more precise
control of model parameters and errors, and the number of observations
grows from 20 to 200-300.&lt;/p>
&lt;figure id="figure-the-change-from-national-to-sub-national-level-comes-with-a-huge-data-processing-price-internal-administrative-boundaries-their-names-codes-codes-change-very-frequently">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="The change from national to sub-national level comes with a huge data processing price: internal administrative boundaries, their names, codes codes change very frequently." srcset="
/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_df043b13fb62aa7b45aa15fad51f4229.webp 400w,
/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_09a0d6124e334c5f1727420a059512a9.webp 760w,
/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/indicator_with_map_hue9f606f6489f63a22f67aeb7e2b3402b_98843_df043b13fb62aa7b45aa15fad51f4229.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
The change from national to sub-national level comes with a huge data processing price: internal administrative boundaries, their names, codes codes change very frequently.
&lt;/figcaption>&lt;/figure>
&lt;p>Yet the change from national to sub-national level comes with a huge
data processing price. While national boundaries are relatively stable,
with only a handful of changes in each recent decade. The change of
national boundaries requires a more-or-less global consensus. But states
are free to change their internal administrative boundaries, and they do
it with large frequency. This means that the names, identification codes
and boundary definitions of sub-national regions change very frequently.
Joining data from different sources and different years can be very
difficult.&lt;/p>
&lt;figure id="figure-our-regions-r-packagehttpsregionsdataobservatoryeu-helps-the-data-processing-validation-and-imputation-of-sub-national-regional-datasets-and-their-coding">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our [regions R package](https://regions.dataobservatory.eu/) helps the data processing, validation and imputation of sub-national, regional datasets and their coding." srcset="
/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_65df57cf4311bb2623535a1a5be044c0.webp 400w,
/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_81a53fd42fac7f0c3fe4e1a89d5b7892.webp 760w,
/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/recoded_indicator_with_map_hubda8124fbfd6305eacfd3d4f0fcd06cc_71873_65df57cf4311bb2623535a1a5be044c0.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our &lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions R package&lt;/a> helps the data processing, validation and imputation of sub-national, regional datasets and their coding.
&lt;/figcaption>&lt;/figure>
&lt;p>There are numerous advantages of switching from a national level of the
analysis to a sub-national level comes with a huge price in data
processing, validation and imputation, and the
&lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions&lt;/a> package aims to help this
process.&lt;/p>
&lt;p>You can review the problem, and the code that created the two map
comparisons, in the &lt;a href="https://regions.dataobservatory.eu/articles/maping.html" target="_blank" rel="noopener">Maping Regional Data, Maping Metadata
Problems&lt;/a>
vignette article of the package. A more detailed problem description can
be found in &lt;a href="https://regions.dataobservatory.eu/articles/Regional_stats.html" target="_blank" rel="noopener">Working With Regional, Sub-National Statistical
Products&lt;/a>.&lt;/p>
&lt;p>This package is an offspring of the
&lt;a href="https://ropengov.github.io/eurostat/" target="_blank" rel="noopener">eurostat&lt;/a> package on
&lt;a href="https://ropengov.github.io/" target="_blank" rel="noopener">rOpenGov&lt;/a>. It started as a tool to
validate and re-code regional Eurostat statistics, but it aims to be a
general solution for all sub-national statistics. It will be developed
parallel with other rOpenGov packages.&lt;/p>
&lt;h2 id="get-the-package">Get the Package&lt;/h2>
&lt;p>You can install the development version from
&lt;a href="https://github.com/" target="_blank" rel="noopener">GitHub&lt;/a> with:&lt;/p>
&lt;pre>&lt;code>devtools::install_github(&amp;quot;rOpenGov/regions&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>or the released version from CRAN:&lt;/p>
&lt;pre>&lt;code>install.packages(&amp;quot;regions&amp;quot;)
&lt;/code>&lt;/pre>
&lt;p>You can review the complete package documentation on
&lt;a href="https://regions.dataobservatory.eu/" target="_blank" rel="noopener">regions.dataobservaotry.eu&lt;/a>. If
you find any problems with the code, please raise an issue on
&lt;a href="https://github.com/rOpenGov/regions" target="_blank" rel="noopener">Github&lt;/a>. Pull requests are welcome
if you agree with the &lt;a href="https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html" target="_blank" rel="noopener">Contributor Code of
Conduct&lt;/a>&lt;/p>
&lt;p>If you use &lt;code>regions&lt;/code> in your work, please cite the
package as:
Daniel Antal, Kasia Kulma, Istvan Zsoldos, &amp;amp; Leo Lahti. (2021, June 16). regions (Version 0.1.7). CRAN. &lt;a href="%28https://doi.org/10.5281/zenodo.4965909%29">http://doi.org/10.5281/zenodo.4965909&lt;/a>&lt;/p>
&lt;p>&lt;a href="https://cran.r-project.org/package=regions" target="_blank" rel="noopener">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.r-pkg.org/badges/version/regions" alt="CRAN_Status_Badge" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/a>&lt;/p>
&lt;h2 id="join-us">Join us&lt;/h2>
&lt;p>&lt;em>Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>. More interested in environmental impact analysis? Try our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href="https://music.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> team!&lt;/em>&lt;/p>
&lt;p>&lt;a href="https://twitter.com/intent/follow?screen_name=EconDataObs" target="_blank" rel="noopener">
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://img.shields.io/twitter/follow/EconDataObs.svg?style=social" alt="Follow GreenDealObs" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/a>&lt;/p></description></item><item><title>Open Data is Like Gold in the Mud Below the Chilly Waves of Mountain Rivers</title><link>https://reprex-next.netlify.app/post/2021-06-10-founder-daniel-antal/</link><pubDate>Thu, 10 Jun 2021 07:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-10-founder-daniel-antal/</guid><description>
&lt;figure id="figure-open-data-is-like-gold-in-the-mud-below-the-chilly-waves-of-mountain-rivers-panning-it-out-requires-a-lot-of-patience-or-a-good-machine">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine." srcset="
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp 400w,
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_faa00e96d3d0b700cfcf1daa513f3ad2.webp 760w,
/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/slides/gold_panning_slide_notitle_hu8f7296f20da8c17f972a0534c44322c2_1382486_b042523dffe8143dea3d8c8c9c3262f4.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>As the founder of the automated data observatories that are part of Reprex’s core activities, what type of data do you usually use in your day-to-day work?&lt;/strong>&lt;/p>
&lt;p>The automated data observatories are results of syndicated research, data pooling, and other creative solutions to the problem of missing or hard-to-find data. The music industry is a very fragmented industry, where market research budgets and data are scattered in tens of thousands of small organizations in Europe. Working for the music and film industry as a data analyst and economist was always a pain because most of the efforts went into trying to find any data that can be analyzed. I spent most of the last 7-8 years trying to find any sort of information—from satellites to government archives—that could be formed into actionable data. I see three big sources of information: textual,numeric, and continuous recordings for on-site, offsite, and satellite sensors. I am much better with numbers than with natural language processing, and I am &lt;a href="https://greendeal.dataobservatory.eu/post/2021-06-06-tutorial-cds/" target="_blank" rel="noopener">improving with sensory sources&lt;/a>. But technically, I can mint any systematic information—the text of an old book, a satellite image, or an opinion poll—into datasets.&lt;/p>
&lt;p>&lt;strong>For you, what would be the ultimate dataset, or datasets that you would like to see in the Economy Data Observatory?&lt;/strong>&lt;/p>
&lt;p>I am a data scientist now, but I used to be a regulatory economist, and I have worked a lot with competition policy and monopoly regulation issues. Our observatories can automatically monitor market and environmental processes, which would allow us to get into computational antitrust. Peter Ormosi, our competition curator, is particularly &lt;a href="https://economy.dataobservatory.eu/post/2021-06-02-data-curator-peter-ormosi/" target="_blank" rel="noopener">interested in&lt;/a> killer acquisitions: approved mergers of big companies that end up piling up patents that are not used. I am more interested in describing systematically which markets are getting more concentrated and more competitive, in real time. Does data concentration coincide with market concentration?&lt;/p>
&lt;p>To bring an example from the realm of our &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a>, which was a prototype to this one, I have been working for some time on creating streaming volume and price indexes, like the &lt;em>Dow Jones Industrial Average&lt;/em> or the various bond market indexes, that talk more about price, demand, and potential revenue in music streaming markets all over the world. We did a first take on this in the &lt;a href="https://ceereport2020.ceemid.eu/" target="_blank" rel="noopener">Central European Music Industry Report&lt;/a> and recently we iterated on the model for the &lt;em>UK Intellectual Property Office&lt;/em> and the &lt;em>UK Music Creators’ Earnings&lt;/em> project. We want to take this further to create a pan-Europe streaming market index, and we will be probably the first to actually be able to report on music market concentrations, and in fact, more or less in a real-time mode.&lt;/p>
&lt;figure id="figure-we-would-like-to-further-developer-our-20-country-streaming-indexeshttpsceereport2020ceemideumarkethtmlceemid-ci-volume-indexes-into-a-global-music-market-index">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="We would like to further developer our 20-country [streaming indexes]((https://ceereport2020.ceemid.eu/market.html#ceemid-ci-volume-indexes)) into a global music market index." srcset="
/media/img/blogposts_2021/medianvalue-1_hu5941f179e15628adbbb6d4dc0db86cd1_46382_59d954e926db1ce3ce9376aac454a3aa.webp 400w,
/media/img/blogposts_2021/medianvalue-1_hu5941f179e15628adbbb6d4dc0db86cd1_46382_75d58bfbbfae9d25c5551030d6d4206a.webp 760w,
/media/img/blogposts_2021/medianvalue-1_hu5941f179e15628adbbb6d4dc0db86cd1_46382_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/medianvalue-1_hu5941f179e15628adbbb6d4dc0db86cd1_46382_59d954e926db1ce3ce9376aac454a3aa.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
We would like to further developer our 20-country &lt;a href="%28https://ceereport2020.ceemid.eu/market.html#ceemid-ci-volume-indexes%29">streaming indexes&lt;/a> into a global music market index.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>Is there a number or piece of information that recently surprised you? If so, what was it?&lt;/strong>&lt;/p>
&lt;p>There were a few numbers that surprised me, and some of them were brought up by our observatory teams. Karel is &lt;a href="post/2021-06-08-data-curator-karel-volckaert/">talking&lt;/a> about the fact that not all green energy is green at all: many hydropower stations contribute to the greenhouse effect and not reduce it. Annette brought up the growing interest in the &lt;a href="https://reprex-next.netlify.app/post/2021-06-09-team-annette-wong/">Dalmatian breed&lt;/a> after the Disney &lt;em>101 Dalmatians&lt;/em> movies, and it reminded me of the astonishing growth in interest for chess sets, chess tutorials, and platform subscriptions after the success of Netflix’s &lt;em>The Queen’s Gambit&lt;/em>.&lt;/p>
&lt;figure id="figure-the-queens-gambit-chess-boom-moves-online-by-rachael-dottle-on-bloombergcomhttpswwwbloombergcomgraphics2020-chess-boom">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="*The Queen’s Gambit’ Chess Boom Moves Online By Rachael Dottle* on [bloomberg.com](https://www.bloomberg.com/graphics/2020-chess-boom/)" srcset="
/media/img/blogposts_2021/queens_gambit_bloomberg_hub50434a1789646b36daf41ad10e65b52_92708_4fc47acea402086dd3891772877289db.webp 400w,
/media/img/blogposts_2021/queens_gambit_bloomberg_hub50434a1789646b36daf41ad10e65b52_92708_b60a154be5ab781fb70d16f62f39966c.webp 760w,
/media/img/blogposts_2021/queens_gambit_bloomberg_hub50434a1789646b36daf41ad10e65b52_92708_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/queens_gambit_bloomberg_hub50434a1789646b36daf41ad10e65b52_92708_4fc47acea402086dd3891772877289db.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
&lt;em>The Queen’s Gambit’ Chess Boom Moves Online By Rachael Dottle&lt;/em> on &lt;a href="https://www.bloomberg.com/graphics/2020-chess-boom/" target="_blank" rel="noopener">bloomberg.com&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;p>Annette is talking about the importance of cultural influencers, and on that theme, what could be more exciting that &lt;a href="https://www.netflix.com/nl-en/title/80234304" target="_blank" rel="noopener">Netflix’s biggest success&lt;/a> so far is not a detective series or a soap opera but a coming-of-age story of a female chess prodigy. Intelligence is sexy, and we are in the intelligence business.&lt;/p>
&lt;p>But to tell a more serious and more sobering number, I recently read with surprise that there are &lt;a href="https://www.theguardian.com/society/2021/may/27/number-of-smokers-has-reached-all-time-high-of-11-billion-study-finds" target="_blank" rel="noopener">more people smoking cigarettes&lt;/a> on Earth in 2021 than in 1990. Population growth in developing countries replaced the shrinking number of developed country smokers. While I live in Europe, where smoking is strongly declining, it reminds me that Europe’s population is a small part of the world. We cannot take for granted that our home-grown experiences about the world are globally valid.&lt;/p>
&lt;p>&lt;strong>Do you have a good example of really good, or really bad use of data?&lt;/strong>&lt;/p>
&lt;p>&lt;a href="https://fivethirtyeight.com/" target="_blank" rel="noopener">FiveThirtyEight.com&lt;/a> had a wonderful podcast series, produced by Jody Avirgan, called &lt;em>What’s the Point&lt;/em>. It is exactly about good and bad uses of data, and each episode is super interesting. Maybe the most memorable is &lt;em>Why the Bronx Really Burned&lt;/em>. New York City tried to measure fire response times, identify redundancies in service, and close or re-allocate fire stations accordingly. What resulted, though, was a perfect storm of bad data: The methodology was flawed, the analysis was rife with biases, and the results were interpreted in a way that stacked the deck against poorer neighborhoods. It is similar to many stories told in a very compelling argument by Catherine D’Ignazio and Lauren F. Klein in their much celebrated book, &lt;em>Data Feminism&lt;/em>. Usually, the bad use of data starts with a bad data collection practice. Data analysts in corporations, NGOs, public policy organizations and even in science usually analyze the data that is available.&lt;/p>
&lt;p>&lt;em>You can find these examples, together with many more that our contributors recommend, in the motivating examples of &lt;a href="https://contributors.dataobservatory.eu/data-curators.html#create-new-datasets" target="_blank" rel="noopener">Create New Datasets&lt;/a> and the &lt;a href="https://contributors.dataobservatory.eu/data-curators.html#critical-attitude" target="_blank" rel="noopener">Remain Critical&lt;/a> parts of our onboarding material. We hope that more and more professionals and citizen scientist will help us to create high-quality and open data.&lt;/em>&lt;/p>
&lt;p>The real power lies in designing a data collection program. A consistent data collection program usually requires an investment that only powerful organizations, such as government agencies, very large corporations, or the richest universities can afford. You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling.&lt;/p>
&lt;figure id="figure-you-cannot-really-analyze-the-data-that-is-not-collected-and-recorded-and-usually-what-is-not-recorded-is-more-interesting-than-what-is-our-observatories-want-to-democratize-the-data-collection-process-and-make-it-more-available-more-shared-with-research-automation-and-pooling">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/slides/value_added_from_automation.png" alt="You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
You cannot really analyze the data that is not collected and recorded; and usually what is not recorded is more interesting than what is. Our observatories want to democratize the data collection process and make it more available, more shared with research automation and pooling.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>From your perspective, what do you see being the greatest problem with open data in 2021?&lt;/strong>&lt;/p>
&lt;p>I have been involved with open data policies since 2004. The problem has not changed much: more and more data are available from governmental and scientific sources, but in a form that makes them useless. Data without clear description and clear processing information is useless for analytical purposes: it cannot be integrated with other data, and it cannot be trusted and verified. If researchers or government entities that fall under the &lt;a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2019.172.01.0056.01.ENG" target="_blank" rel="noopener">Open Data Directive&lt;/a> release data for reuse in a way that does not have descriptive or processing metadata, it is almost as if they did not release anything. You need this additional information to make valid analyses of the data, and to reverse-engineer them may cost more than to recollect the data in a properly documented process. Our developers, particularly &lt;a href="https://reprex-next.netlify.app/post/2021-06-04-developer-leo-lahti/">Leo&lt;/a> and &lt;a href="post/2021-06-07-data-curator-pyry-kantanen/">Pyry&lt;/a> are talking eloquently about why you have to be careful even with governmental statistical products, and constantly be on the watch out for data quality.&lt;/p>
&lt;figure id="figure-our-apidata-is-not-only-publishing-descriptive-and-processing-metadata-alongside-with-our-data-but-we-also-make-all-critical-elements-of-our-processing-code-available-for-peer-review-on-ropengovauthorsropengov">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/EDO_API_metadata_table.png" alt="Our [API](/#data) is not only publishing descriptive and processing metadata alongside with our data, but we also make all critical elements of our processing code available for peer-review on [rOpenGov](/authors/ropengov/)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our &lt;a href="https://reprex-next.netlify.app/#data">API&lt;/a> is not only publishing descriptive and processing metadata alongside with our data, but we also make all critical elements of our processing code available for peer-review on &lt;a href="https://reprex-next.netlify.app/authors/ropengov/">rOpenGov&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>What do you think the Economy Data Observatory, and our other automated observatories do, to make open data more credible in the European economic policy community and be accepted as verified information?&lt;/strong>&lt;/p>
&lt;p>Most of our work is in research automation, and a very large part of our efforts are aiming to reverse engineer missing descriptive and processing metadata. In a way, I like to compare ourselves to the working method of the open-source intelligence platform &lt;a href="https://www.bellingcat.com" target="_blank" rel="noopener">Bellingcat&lt;/a>. They were able to use publicly available, &lt;a href="https://www.bellingcat.com/category/resources/case-studies/?fwp_tags=mh17" target="_blank" rel="noopener">scattered information from satellites and social media&lt;/a> to identify each member of the Russian military company that illegally entered the territory of Ukraine and shot down the Malaysian Airways MH17 with 297, mainly Dutch, civilians on board.&lt;/p>
&lt;figure id="figure-how-we-create-value-for-research-oriented-consultancies-public-policy-institutes-university-research-teams-journalists-or-ngos">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/slides/automated_observatory_value_chain.jpg" alt="How we create value for research-oriented consultancies, public policy institutes, university research teams, journalists or NGOs." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
How we create value for research-oriented consultancies, public policy institutes, university research teams, journalists or NGOs.
&lt;/figcaption>&lt;/figure>
&lt;p>We do not do such investigations but work very similarly to them in how we are filtering through many data sources and attempting to verify them when their descriptions and processing history is unknown. In the last years, we were able to estore the metadata of many European and African open data surveys, economic impact, and environmental impact data, or many other open data that was lying around for many years without users.&lt;/p>
&lt;p>Open data is like gold in the mud below the chilly waves of mountain rivers. Panning it out requires a lot of patience, or a good machine. I think we will come to as surprising and strong findings as Bellingcat, but we are not focusing on individual events and stories, but on social and environmental processes and changes.&lt;/p>
&lt;figure id="figure-join-our-open-collaboration-economy-data-observatory-team-as-a-data-curatorauthorscurator-developerauthorsdeveloper-or-business-developerauthorsteam-or-share-your-data-in-our-public-repository-economy-data-observatory-on-zenodohttpszenodoorgcommunitieseconomy_observatory">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/edo_and_zenodo.png" alt="Join our open collaboration Economy Data Observatory team as a [data curator](/authors/curator), [developer](/authors/developer) or [business developer](/authors/team), or share your data in our public repository [Economy Data Observatory on Zenodo](https://zenodo.org/communities/economy_observatory/)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>, or share your data in our public repository &lt;a href="https://zenodo.org/communities/economy_observatory/" target="_blank" rel="noopener">Economy Data Observatory on Zenodo&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;h2 id="join-us">Join us&lt;/h2>
&lt;p>&lt;em>Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>. More interested in environmental impact analysis? Try our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href="https://music.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> team!&lt;/em>&lt;/p></description></item><item><title>Comparing Data to Oil is a Cliché: Crude Oil Has to Go Through a Number of Steps and Pipes Before it Becomes Useful</title><link>https://reprex-next.netlify.app/post/2021-06-07-data-curator-pyry-kantanen/</link><pubDate>Mon, 07 Jun 2021 10:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-07-data-curator-pyry-kantanen/</guid><description>&lt;p>&lt;strong>As a developer at rOpenGov, and as an economic sociologist, what type of data do you usually use in your work?&lt;/strong>&lt;/p>
&lt;p>Generally speaking, people&amp;rsquo;s access to (or inequalities in accessing) different types of resources and their ability in transforming these resources to other types of resources is what interests me. The data I usually work with is the kind of data that is actually nicely covered by existing &lt;a href="http://ropengov.org/projects/" target="_blank" rel="noopener">rOpenGov tools&lt;/a>: data about population demographics and administrative units from Statistics Finland, statistical information on welfare and health from Sotkanet and also data from Eurostat. Aside from these a lot of information is of course data from surveys and texts scraped from the internet.&lt;/p>
&lt;figure id="figure-we-are-placing-the-growing-number-of-ropengov-toolshttpropengovorgprojects-in-a-modern-application-with-a-user-friendly-service-and-a-modern-data-api">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/partners/rOpenGov-intro.png" alt="We are placing the growing number of [rOpenGov tools](http://ropengov.org/projects/) in a modern application with a user-friendly service and a modern data API." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
We are placing the growing number of &lt;a href="http://ropengov.org/projects/" target="_blank" rel="noopener">rOpenGov tools&lt;/a> in a modern application with a user-friendly service and a modern data API.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;em>In your ideal data world, what would be the ultimate dataset, or datasets that you would like to see in the Music Data Observatory?&lt;/em>&lt;/p>
&lt;p>Late spring and early summer time is, at least for me, defined by the Eurovision Song Contest. Every year watching the contest makes me ponder the state of the music industry in my home country Finland as well as in Europe. Was the song produced by homegrown talent or was it imported? Was it better received by the professional jury or the public? How well does the domestic appeal of an artist translate to the international stage? Many interesting phenomena are difficult to quantify in a meaningful way and writing a catchy song with international appeal is probably more an art than a science. Nevertheless that should not deter us from trying as music, too, is bound by certain rules and regularities that can be researched.&lt;/p>
&lt;figure id="figure-music-too-is-bound-by-certain-rules-and-regularities-that-can-be-researched-our-digital-music-observatory-and-its-listen-localhttpslistenlocalcommunity-experimental-app-does-this-exactly-and-we-would-love-to-create-eurovision-musicology-datasets-photo-eurovision-song-contest-2021-press-photo-by-jordy-brada">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/developers/eurovision_2021.jpg" alt="Music, too, is bound by certain rules and regularities that can be researched. Our Digital Music Observatory and its [Listen Local](https://listenlocal.community/) experimental App does this exactly, and we would love to create Eurovision musicology datasets. Photo: Eurovision Song Contest 2021 press photo by Jordy Brada" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Music, too, is bound by certain rules and regularities that can be researched. Our Digital Music Observatory and its &lt;a href="https://listenlocal.community/" target="_blank" rel="noopener">Listen Local&lt;/a> experimental App does this exactly, and we would love to create Eurovision musicology datasets. Photo: Eurovision Song Contest 2021 press photo by Jordy Brada
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;em>Why did you decide to join the EU Datathon challenge team and why do you think that this would be a game changer for researchers and policymakers?&lt;/em>&lt;/p>
&lt;p>The challenge has, in my opinion, great potential in leading by example when it comes to open data access and reproducible research. Comparing data to oil is a common phrase but fitting in the sense that crude oil has to go through a number of steps and pipes before it becomes useful. Most users and especially policymakers appreciate ease-of-use of the finished product, but the quality of the product and the process must also be guaranteed somehow. Openness and peer-review practices are the best guarantors in the field of data, just as industrial standards and regulations are in the oil industry.&lt;/p>
&lt;figure id="figure-we-provide-many-layers-of-fully-transparent-quality-control-about-the-data-we-are-placing-in-our-data-apis-and-provide-for-our-end-users">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/EDO_API_metadata_table.png" alt="We provide many layers of fully transparent quality control about the data we are placing in our data APIs and provide for our end-users." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
We provide many layers of fully transparent quality control about the data we are placing in our data APIs and provide for our end-users.
&lt;/figcaption>&lt;/figure>
&lt;h2 id="join-us">Join us&lt;/h2>
&lt;p>&lt;em>Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>. More interested in environmental impact analysis? Try our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href="https://music.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> team!&lt;/em>&lt;/p></description></item><item><title>Creating Algorithmic Tools to Interpret and Communicate Open Data Efficiently</title><link>https://reprex-next.netlify.app/post/2021-06-04-developer-leo-lahti/</link><pubDate>Fri, 04 Jun 2021 10:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-04-developer-leo-lahti/</guid><description>&lt;p>&lt;strong>As a developer at rOpenGov, what type of data do you usually use in your work?&lt;/strong>&lt;/p>
&lt;p>As an academic data scientist whose research focuses on the development of general-purpose algorithmic methods, I work with a range of applications from life sciences to humanities. Population studies play a big role in our research, and often the information that we can draw from public sources - geospatial, demographic, environmental - provides invaluable support. We typically use open data in combination with sensitive research data but some of the research questions can be readily addressed based on open data from statistical authorities such as Statistics Finland or Eurostat.&lt;/p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/partners/rOpenGov-intro.png" alt="" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;p>&lt;strong>In your ideal data world, what would be the ultimate dataset, or datasets that you would like to see in the Music Data Observatory?&lt;/strong>&lt;/p>
&lt;p>One line of our research analyses the historical trends and spread of knowledge production, in particular book printing based on large-scale metadata collections. It would be interesting to extend this research to music, to understand the contemporary trends as well as the broader historical developments. Gaining access to a large systematic collection of music and composition data from different countries across long periods of time would make this possible.&lt;/p>
&lt;p>&lt;strong>Why did you decide to join the challenge and why do you think that this would be a game changer for researchers and policymakers?&lt;/strong>&lt;/p>
&lt;p>Joining the challenge was a natural development based on our overall activities in this area; &lt;a href="http://ropengov.org/community/" target="_blank" rel="noopener">the rOpenGov project&lt;/a> has been around for a decade now, since the early days of the broader open data movement. This has also created an active international developer network and we felt well equipped for picking up the challenge. The game changer for researchers is that the project highlights the importance of data quality, even when dealing with official statistics, and provides new methods to solve these issues efficiently through the open collaboration model. For policymakers, this provides access to new high-quality curated data and case studies that can support evidence-based decision-making.&lt;/p>
&lt;p>&lt;strong>Do you have a favorite, or most used open governmental or open science data source? What do you think about it? Could it be improved?&lt;/strong>&lt;/p>
&lt;p>Regarding open government data, one of my favorites is not a single data source but a data representation standard. The &lt;a href="https://www.scb.se/en/services/statistical-programs-for-px-files/#:~:text=PX%20is%20a%20standard%20format,and%20data." target="_blank" rel="noopener">px format&lt;/a> is widely used by statistical authorities in various countries, and this has allowed us to create R tools that allow the retrieval and analysis of official statistics from many countries across Europe, spanning dozens of statistical institutions. Standardization of open data formats allows us to build robust algorithmic tools for downstream data analysis and visualization. Open government data is still too often shared in obscure, non-standard or closed-source file formats and this is creating significant bottlenecks for the development of scalable and interoperable AI and machine learning methods that can harness the full potential of open data.&lt;/p>
&lt;figure id="figure-regarding-open-government-data-one-of-my-favorites-is-not-a-single-data-source-but-a-data-representation-standard-the-px-format">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/developers/PxWeb.png" alt="Regarding open government data, one of my favorites is not a single data source but a data representation standard, the Px format." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Regarding open government data, one of my favorites is not a single data source but a data representation standard, the Px format.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>From your perspective, what do you see being the greatest problem with open data in 2021?&lt;/strong>&lt;/p>
&lt;p>Although there are a variety of open data sources available (and the numbers continue to increase), the availability of open algorithmic tools to interpret and communicate open data efficiently is lagging behind. One of the greatest challenges for open data in 2021 is to demonstrate how we can maximize the potential of open data by designing smart tools for open data analytics.&lt;/p>
&lt;p>&lt;strong>What can our automated data observatories do to make open data more credible in the European economic policy community and be accepted as verified information?&lt;/strong>&lt;/p>
&lt;p>The role of the professional network backing up the project, and the possibility of getting critical feedback and later adoption by the academic communities will support the efforts. Transparency of the data harmonization operations is the key to credibility, and will be further supported by concrete benchmarks that highlight the critical differences in drawing conclusions based on original sources versus the harmonized high-quality data sets.&lt;/p>
&lt;figure id="figure-we-need-to-get-critical-feedback-and-later-adoption-by-the-academic-communities">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="We need to get critical feedback and later adoption by the academic communities." srcset="
/media/img/observatory_screenshots/greendeal_and_zenodo_huddcd7485e56cb33c97d3e664ae383275_281994_debfc54dcf2193c7c800dab0f36de429.webp 400w,
/media/img/observatory_screenshots/greendeal_and_zenodo_huddcd7485e56cb33c97d3e664ae383275_281994_3b536090581f2795373e801d65371e20.webp 760w,
/media/img/observatory_screenshots/greendeal_and_zenodo_huddcd7485e56cb33c97d3e664ae383275_281994_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/observatory_screenshots/greendeal_and_zenodo_huddcd7485e56cb33c97d3e664ae383275_281994_debfc54dcf2193c7c800dab0f36de429.webp"
width="760"
height="507"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
We need to get critical feedback and later adoption by the academic communities.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>How we can ensure the long-term sustainability of the efforts?&lt;/strong>&lt;/p>
&lt;p>The extent of open data space is such that no single individual or institution can address all the emerging needs in this area. The open developer networks play a huge role in the development of algorithmic methods, and strong communities have developed around specific open data analytical environments such as R, Python, and Julia. These communities support networked collaboration and provide services such as software peer review. The long-term sustainability will depend on the support that such developer communities can receive, both from individual contributors as well as from institutions and governments.&lt;/p>
&lt;figure id="figure-join-our-open-collaboration-economy-data-observatory-team-as-a-data-curatorauthorscurator-developerauthorsdeveloper-or-business-developerauthorsteam-or-share-your-data-in-our-public-repository-economy-data-observatory-on-zenodohttpszenodoorgcommunitieseconomy_observatory">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/edo_and_zenodo.png" alt="Join our open collaboration Economy Data Observatory team as a [data curator](/authors/curator), [developer](/authors/developer) or [business developer](/authors/team), or share your data in our public repository [Economy Data Observatory on Zenodo](https://zenodo.org/communities/economy_observatory/)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>, or share your data in our public repository &lt;a href="https://zenodo.org/communities/economy_observatory/" target="_blank" rel="noopener">Economy Data Observatory on Zenodo&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;h2 id="join-us">Join us&lt;/h2>
&lt;p>&lt;em>Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>. More interested in environmental impact analysis? Try our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href="https://music.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> team!&lt;/em>&lt;/p></description></item><item><title>Economic and Environment Impact Analysis, Automated for Data-as-Service</title><link>https://reprex-next.netlify.app/post/2021-06-03-iotables-release/</link><pubDate>Thu, 03 Jun 2021 16:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-03-iotables-release/</guid><description>&lt;p>We have released a new version of
&lt;a href="https://iotables.dataobservatory.eu/" target="_blank" rel="noopener">iotables&lt;/a> as part of the
&lt;a href="http://ropengov.org/" target="_blank" rel="noopener">rOpenGov&lt;/a> project. The package, as the name
suggests, works with European symmetric input-output tables (SIOTs).
SIOTs are among the most complex governmental statistical products. They
show how each country’s 64 agricultural, industrial, service, and
sometimes household sectors relate to each other. They are estimated
from various components of the GDP, tax collection, at least every five
years.&lt;/p>
&lt;p>SIOTs offer great value to policy-makers and analysts to make more than
educated guesses on how a million euros, pounds or Czech korunas spent
on a certain sector will impact other sectors of the economy, employment
or GDP. What happens when a bank starts to give new loans and advertise
them? How is an increase in economic activity going to affect the amount
of wages paid and and where will consumers most likely spend their
wages? As the national economies begin to reopen after COVID-19 pandemic
lockdowns, is to utilize SIOTs to calculate direct and indirect
employment effects or value added effects of government grant programs
to sectors such as cultural and creative industries or actors such as
venues for performing arts, movie theaters, bars and restaurants.&lt;/p>
&lt;p>Making such calculations requires a bit of matrix algebra, and
understanding of input-output economics, direct, indirect effects, and
multipliers. Economists, grant designers, policy makers have those
skills, but until now, such calculations were either made in cumbersome
Excel sheets, or proprietary software, as the key to these calculations
is to keep vectors and matrices, which have at least one dimension of
64, perfectly aligned. We made this process reproducible with
&lt;a href="https://iotables.dataobservatory.eu/" target="_blank" rel="noopener">iotables&lt;/a> and
&lt;a href="https://CRAN.R-project.org/package=eurostat" target="_blank" rel="noopener">eurostat&lt;/a> on
&lt;a href="http://ropengov.org/" target="_blank" rel="noopener">rOpenGov&lt;/a>&lt;/p>
&lt;figure id="figure-our-iotables-package-creates-direct-indirect-effects-and-multipliers-programatically-our-observatory-will-make-those-indicators-available-for-all-european-countries">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/package_screenshots/iotables_0_4_5.png" alt="Our iotables package creates direct, indirect effects and multipliers programatically. Our observatory will make those indicators available for all European countries." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our iotables package creates direct, indirect effects and multipliers programatically. Our observatory will make those indicators available for all European countries.
&lt;/figcaption>&lt;/figure>
&lt;h2 id="accessing-and-tidying-the-data-programmatically">Accessing and tidying the data programmatically&lt;/h2>
&lt;p>The iotables package is in a way an extension to the &lt;em>eurostat&lt;/em> R
package, which provides a programmatic access to the
&lt;a href="https://ec.europa.eu/eurostat" target="_blank" rel="noopener">Eurostat&lt;/a> data warehouse. The reason for
releasing a new package is that working with SIOTs requires plenty of
meticulous data wrangling based on various &lt;em>metadata&lt;/em> sources, apart
from actually accessing the &lt;em>data&lt;/em> itself. When working with matrix
equations, the bar is higher than with tidy data. Not only your rows and
columns must match, but their ordering must strictly conform the
quadrants of the a matrix system, including the connecting trade or tax
matrices.&lt;/p>
&lt;p>When you download a country’s SIOT table, you receive a long form data
frame, a very-very long one, which contains the matrix values and their
labels like this:&lt;/p>
&lt;pre>&lt;code>## Table naio_10_cp1700 cached at C:\Users\...\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds
# we save it for further reference here
saveRDS(naio_10_cp1700, &amp;quot;not_included/naio_10_cp1700_date_code_FF.rds&amp;quot;)
# should you need to retrieve the large tempfiles, they are in
dir (file.path(tempdir(), &amp;quot;eurostat&amp;quot;))
dplyr::slice_head(naio_10_cp1700, n = 5)
## # A tibble: 5 x 7
## unit stk_flow induse prod_na geo time values
## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;date&amp;gt; &amp;lt;dbl&amp;gt;
## 1 MIO_EUR DOM CPA_A01 B1G EA19 2019-01-01 141873.
## 2 MIO_EUR DOM CPA_A01 B1G EU27_2020 2019-01-01 174976.
## 3 MIO_EUR DOM CPA_A01 B1G EU28 2019-01-01 187814.
## 4 MIO_EUR DOM CPA_A01 B2A3G EA19 2019-01-01 0
## 5 MIO_EUR DOM CPA_A01 B2A3G EU27_2020 2019-01-01 0
&lt;/code>&lt;/pre>
&lt;p>The metadata reads like this: the units are in millions of euros, we are
analyzing domestic flows, and the national account items &lt;code>B1-B2&lt;/code> for the
industry &lt;code>A01&lt;/code>. The information of a 64x64 matrix (the SIOT) and its
connecting matrices, such as taxes, or employment, or &lt;em>C**O&lt;/em>&lt;sub>2&lt;/sub>
emissions, must be placed exactly in one correct ordering of columns and
rows. Every single data wrangling error will usually lead in an error
(the matrix equation has no solution), or, what is worse, in a very
difficult to trace algebraic error. Our package not only labels this
data meaningfully, but creates very tidy data frames that contain each
necessary matrix of vector with a key column.&lt;/p>
&lt;p>iotables package contains the vocabularies (abbreviations and human
readable labels) of three statistical vocabularies: the so called
&lt;code>COICOP&lt;/code> product codes, the &lt;code>NACE&lt;/code> industry codes, and the vocabulary of
the &lt;code>ESA2010&lt;/code> definition of national accounts (which is the government
equivalent of corporate accounting).&lt;/p>
&lt;p>Our package currently solves all equations for direct, indirect effects,
multipliers and inter-industry linkages. Backward linkages show what
happens with the suppliers of an industry, such as catering or
advertising in the case of music festivals, if the festivals reopen. The
forward linkages show how much extra demand this creates for connecting
services that treat festivals as a ‘supplier’, such as cultural tourism.&lt;/p>
&lt;h2 id="lets-seen-an-example">Let’s seen an example&lt;/h2>
&lt;pre>&lt;code>## Downloading employment data from the Eurostat database.
## Table lfsq_egan22d cached at C:\Users\...\Temp\RtmpGQF4gr/eurostat/lfsq_egan22d_date_code_FF.rds
&lt;/code>&lt;/pre>
&lt;p>and match it with the latest structural information on from the
&lt;a href="http://appsso.eurostat.ec.europa.eu/nui/show.do?wai=true&amp;amp;dataset=naio_10_cp1700" target="_blank" rel="noopener">Symmetric input-output table at basic prices (product by
product)&lt;/a>
Eurostat product. A quick look at the Eurostat website already shows
that there is a lot of work ahead to make the data look like an actual
Symmetric input-output table. Download it with &lt;code>iotable_get()&lt;/code> which
does basic labelling and preprocessing on the raw Eurostat files.
Because of the size of the unfiltered dataset on Eurostat, the following
code may take several minutes to run.&lt;/p>
&lt;pre>&lt;code>sk_io &amp;lt;- iotable_get ( labelled_io_data = NULL,
source = &amp;quot;naio_10_cp1700&amp;quot;, geo = &amp;quot;SK&amp;quot;,
year = 2015, unit = &amp;quot;MIO_EUR&amp;quot;,
stk_flow = &amp;quot;TOTAL&amp;quot;,
labelling = &amp;quot;iotables&amp;quot; )
## Reading cache file C:\Users\..\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds
## Table naio_10_cp1700 read from cache file: C:\Users\..\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds
## Saving 808 input-output tables into the temporary directory
## C:\Users\...\Temp\RtmpGQF4gr
## Saved the raw data of this table type in temporary directory C:\Users\...\Temp\RtmpGQF4gr/naio_10_cp1700.rds.
&lt;/code>&lt;/pre>
&lt;p>The &lt;code>input_coefficient_matrix_create()&lt;/code> creates the input coefficient
matrix, which is used for most of the analytical functions.&lt;/p>
&lt;p>&lt;em>a&lt;/em>&lt;sub>&lt;em>i**j&lt;/em>&lt;/sub> = &lt;em>X&lt;/em>&lt;sub>&lt;em>i**j&lt;/em>&lt;/sub> / &lt;em>x&lt;/em>&lt;sub>&lt;em>j&lt;/em>&lt;/sub>&lt;/p>
&lt;p>It checks the correct ordering of columns, and furthermore it fills up 0
values with 0.000001 to avoid division with zero.&lt;/p>
&lt;pre>&lt;code>input_coeff_matrix_sk &amp;lt;- input_coefficient_matrix_create(
data_table = sk_io
)
## Columns and rows of real_estate_imputed_a, extraterriorial_organizations are all zeros and will be removed.
&lt;/code>&lt;/pre>
&lt;p>Then you can create the Leontieff-inverse, which contains all the
structural information about the relationships of 64x64 sectors of the
chosen country, in this case, Slovakia, ready for the main equations of
input-output economics.&lt;/p>
&lt;pre>&lt;code>I_sk &amp;lt;- leontieff_inverse_create(input_coeff_matrix_sk)
&lt;/code>&lt;/pre>
&lt;p>And take out the primary inputs:&lt;/p>
&lt;pre>&lt;code>primary_inputs_sk &amp;lt;- coefficient_matrix_create(
data_table = sk_io,
total = 'output',
return = 'primary_inputs')
## Columns and rows of real_estate_imputed_a, extraterriorial_organizations are all zeros and will be removed.
&lt;/code>&lt;/pre>
&lt;p>Now let’s see if there the government tries to stimulate the economy in
three sectors, agricultulre, car manufacturing, and R&amp;amp;D with a billion
euros. Direct effects measure the initial, direct impact of the change
in demand and supply for a product. When production goes up, it will
create demand in all supply industries (backward linkages) and create
opportunities in the industries that use the product themselves (forward
linkages.)&lt;/p>
&lt;pre>&lt;code>direct_effects_create( primary_inputs_sk, I_sk ) %&amp;gt;%
select ( all_of(c(&amp;quot;iotables_row&amp;quot;, &amp;quot;agriculture&amp;quot;,
&amp;quot;motor_vechicles&amp;quot;, &amp;quot;research_development&amp;quot;))) %&amp;gt;%
filter (.data$iotables_row %in% c(&amp;quot;gva_effect&amp;quot;, &amp;quot;wages_salaries_effect&amp;quot;,
&amp;quot;imports_effect&amp;quot;, &amp;quot;output_effect&amp;quot;))
## iotables_row agriculture motor_vechicles research_development
## 1 imports_effect 1.3684350 2.3028203 0.9764921
## 2 wages_salaries_effect 0.2713804 0.3183523 0.3828014
## 3 gva_effect 0.9669621 0.9790771 0.9669467
## 4 output_effect 2.2876287 3.9840251 2.2579634
&lt;/code>&lt;/pre>
&lt;p>Car manufacturing requires much imported components, so each extra
demand will create a large importing activity. The R&amp;amp;D will create a the
most local wages (and supports most jobs) because research is
job-intensive. As we can see, the effect on imports, wages, gross value
added (which will end up in the GDP) and output changes are very
different in these three sectors.&lt;/p>
&lt;p>This is not the total effect, because some of the increased production
will translate into income, which in turn will be used to create further
demand in all parts of the domestic economy. The total effect is
characterized by multipliers.&lt;/p>
&lt;p>Then solve for the multipliers:&lt;/p>
&lt;pre>&lt;code>multipliers_sk &amp;lt;- input_multipliers_create(
primary_inputs_sk %&amp;gt;%
filter (.data$iotables_row == &amp;quot;gva&amp;quot;), I_sk )
&lt;/code>&lt;/pre>
&lt;p>And select a few industries:&lt;/p>
&lt;pre>&lt;code>set.seed(12)
multipliers_sk %&amp;gt;%
tidyr::pivot_longer ( -all_of(&amp;quot;iotables_row&amp;quot;),
names_to = &amp;quot;industry&amp;quot;,
values_to = &amp;quot;GVA_multiplier&amp;quot;) %&amp;gt;%
select (-all_of(&amp;quot;iotables_row&amp;quot;)) %&amp;gt;%
arrange( -.data$GVA_multiplier) %&amp;gt;%
dplyr::sample_n(8)
## # A tibble: 8 x 2
## industry GVA_multiplier
## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;
## 1 motor_vechicles 7.81
## 2 wood_products 2.27
## 3 mineral_products 2.83
## 4 human_health 1.53
## 5 post_courier 2.23
## 6 sewage 1.82
## 7 basic_metals 4.16
## 8 real_estate_services_b 1.48
&lt;/code>&lt;/pre>
&lt;h2 id="vignettes">Vignettes&lt;/h2>
&lt;p>The &lt;a href="https://iotables.dataobservatory.eu/articles/germany_1990.html" target="_blank" rel="noopener">Germany
1990&lt;/a>
provides an introduction of input-output economics and re-creates the
examples of the &lt;a href="https://iotables.dataobservatory.eu/articles/germany_1990.html" target="_blank" rel="noopener">Eurostat Manual of Supply, Use and Input-Output
Tables&lt;/a>,
by Jörg Beutel (Eurostat Manual).&lt;/p>
&lt;p>The &lt;a href="https://iotables.dataobservatory.eu/articles/united_kingdom_2010.html" target="_blank" rel="noopener">United Kingdom Input-Output Analytical Tables Daniel Antal, based
on the work edited by Richard
Wild&lt;/a>
is a use case on how to correctly import data from outside Eurostat
(i.e. not with &lt;code>eurostat::get_eurostat()&lt;/code>) and join it properly to a
SIOT. We also used this example to create unit tests of our functions
from a published, official government statistical release.&lt;/p>
&lt;p>Finally, &lt;a href="https://iotables.dataobservatory.eu/articles/working_with_eurostat.html" target="_blank" rel="noopener">Working With Eurostat
Data&lt;/a>
is a detailed use case of working with all the current functionalities
of the package by comparing two economies, Czechia and Slovakia and
guides you through a lot more examples than this short blogpost.&lt;/p>
&lt;p>Our package was originally developed to calculate GVA and employment
effects for the Slovak music industry (see our &lt;a href="https://music.dataobservatory.eu/publication/slovak_music_industry_2019/" target="_blank" rel="noopener">Slovak Music Industry Report&lt;/a>), and similar calculations for the
Hungarian film tax shelter. We can now programatically create
reproducible multipliers for all European economies in the &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital
Music Observatory&lt;/a>, and create
further indicators for economic policy making in the &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data
Observatory&lt;/a>.&lt;/p>
&lt;h2 id="environmental-impact-analysis">Environmental Impact Analysis&lt;/h2>
&lt;p>Our package allows the calculation of various economic policy scenarios,
such as changing the VAT on meat or effects of re-opening music
festivals on aggregate demand, GDP, tax revenues, or employment. But
what about the &lt;em>C**O&lt;/em>&lt;sub>2&lt;/sub>, methane and other greenhouse gas
effects of the reopening festivals, or the increasing meat prices?&lt;/p>
&lt;p>Technically our package can already calculate such effects, but to do
so, you have to carefully match further statistical vocabulary items
used by the European Environmental Agency about air pollutants and
greenhouse gases.&lt;/p>
&lt;p>The last released version of &lt;em>iotables&lt;/em> is Importing and Manipulating
Symmetric Input-Output Tables (Version 0.4.4). Zenodo.
&lt;a href="https://zenodo.org/record/4897472" target="_blank" rel="noopener">https://doi.org/10.5281/zenodo.4897472&lt;/a>,
but we are alread working on a new major release. In that release, we
are planning to build in the necessary vocabulary into the metadata
functions to increase the functionality of the package, and create new
indicators for our &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data
Observatory&lt;/a>. This experimental
data observatory is creating new, high quality statistical indicators
from open governmental and open science data sources that has not seen
the daylight yet.&lt;/p>
&lt;h2 id="ropengov-and-the-eu-datathon-challenges">rOpenGov and the EU Datathon Challenges&lt;/h2>
&lt;figure id="figure-ropengov-reprex-and-other-open-collaboration-partners-teamed-up-to-build-on-our-expertise-of-open-source-statistical-software-development-further-we-want-to-create-a-technologically-and-financially-feasible-data-as-service-to-put-our-reproducible-research-products-into-wider-user-for-the-business-analyst-scientific-researcher-and-evidence-based-policy-design-communities">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/partners/rOpenGov-intro.png" alt="rOpenGov, Reprex, and other open collaboration partners teamed up to build on our expertise of open source statistical software development further: we want to create a technologically and financially feasible data-as-service to put our reproducible research products into wider user for the business analyst, scientific researcher and evidence-based policy design communities." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
rOpenGov, Reprex, and other open collaboration partners teamed up to build on our expertise of open source statistical software development further: we want to create a technologically and financially feasible data-as-service to put our reproducible research products into wider user for the business analyst, scientific researcher and evidence-based policy design communities.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;a href="http://ropengov.org/" target="_blank" rel="noopener">rOpenGov&lt;/a> is a community of open governmental
data and statistics developers with many packages that make programmatic
access and work with open data possible in the R language.
&lt;a href="https://reprex.nl/" target="_blank" rel="noopener">Reprex&lt;/a> is a Dutch-startup that teamed up with
rOpenGov and other open collaboration partners to create a
technologically and financially feasible service to exploit reproducible
research products for the wider business, scientific and evidence-based
policy design community. Open data is a legal concept - it means that
you have the rigth to reuse the data, but often the reuse requires
significant programming and statistical know-how. We entered into the
annual &lt;a href="https://reprex.nl/project/eu-datathon_2021/" target="_blank" rel="noopener">EU Datathon&lt;/a>
competition in all three challenges with our applications to not only
provide open-source software, but daily updated, validated, documented,
high-quality statistical indicators as open data in an open database.
Our &lt;a href="https://iotables.dataobservatory.eu/" target="_blank" rel="noopener">iotables&lt;/a> package is one of
our many open-source building blocks to make open data more accessible
to all.&lt;/p>
&lt;p>&lt;em>Join our open collaboration Digital Music Observatory team as a &lt;a href="https://music.dataobservatory.eu/authors/curator" target="_blank" rel="noopener">data curator&lt;/a>, &lt;a href="https://music.dataobservatory.eu/authors/developer" target="_blank" rel="noopener">developer&lt;/a> or &lt;a href="https://music.dataobservatory.eu/authors/team" target="_blank" rel="noopener">business developer&lt;/a>. More interested in environmental impact analysis? Try our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> team! Or economic policies, particularly computation antitrust, innovation and small enterprises? Check out our &lt;a href="https://economy.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Economy Music Observatory&lt;/a> team!&lt;/em>&lt;/p></description></item><item><title>New Indicators for Computational Antitrust</title><link>https://reprex-next.netlify.app/post/2021-06-02-data-curator-peter-ormosi/</link><pubDate>Wed, 02 Jun 2021 17:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-06-02-data-curator-peter-ormosi/</guid><description>&lt;p>&lt;strong>As someone who’s worked in data for almost 20 years, what type of data do you usually use in your research?&lt;/strong>&lt;/p>
&lt;p>In my field (industrial organisation, competition policy), company level financial data, and product price and sales data have been the conventional building blocks of research papers. Ideally this has been the sort of data that I would seek out for my work. Of course as academic researchers we often get knocked back by the reality of data access and availability. I would think that industrial organisation is one of those fields where researchers have to be quite innovative in terms of answering interesting and relevant policy questions, whilst having to operate in an environment where most relevant data is proprietary and very expensive. Against this backdrop, I have worked with neatly organised proprietary datasets, self-assembled data collections, and also textual data.&lt;/p>
&lt;p>&lt;strong>From your experience working with various data sets, models, and frameworks, what would be the ultimate dataset, or datasets that you would like to see from the Economy Data Observatory?&lt;/strong>&lt;/p>
&lt;p>There seems to be an emerging consensus that market concentration and markups have been continuously increasing across the economy. But most of these works use industry classification to define markets. One of the things I’d really like to see coming out of the Economy Data Observatory is a mapping of what we call antitrust markets.&lt;/p>
&lt;figure id="figure-mapping-nace-to-antitrust-markets">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Mapping NACE to Antitrust Markets." srcset="
/media/img/blogposts_2021/nace_antitrust_map_hua71260c5d6d1149b51eafc96ae45b10a_141688_66cd552939abcbdfbcb0dec263a391b5.webp 400w,
/media/img/blogposts_2021/nace_antitrust_map_hua71260c5d6d1149b51eafc96ae45b10a_141688_3e0e86b0fc3cd17edf80fe4e77bb19d2.webp 760w,
/media/img/blogposts_2021/nace_antitrust_map_hua71260c5d6d1149b51eafc96ae45b10a_141688_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/nace_antitrust_map_hua71260c5d6d1149b51eafc96ae45b10a_141688_66cd552939abcbdfbcb0dec263a391b5.webp"
width="760"
height="421"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Mapping NACE to Antitrust Markets.
&lt;/figcaption>&lt;/figure>
&lt;p>Available datasets use standard industry classification (such as NACE in the EU), which is often very different from what we call a product market in microeconomics. Product markets are defined by demand, and supply-side substitutability, which is a dynamically evolving feature and difficult to capture systematically on a wider scale. But with the recent proliferation of data and the growth (and fall in price) of computing power, I am positive that we could attempt to map out the European economy along these product market boundaries. Of course this is not without any challenge. For example in digital markets, traditional ways to define markets have caused serious challenges to competition authorities around the world.&lt;/p>
&lt;p>I believe that there is an immensely rich, and largely unexplored source of information in unstructured textual data that would be hugely useful for applied microeconomic works, including my own area of IO and competition policy. This includes a large corpus of administrative and court decisions that relate to businesses, such as merger control decisions of the European Commission. To give two examples from my experience, we’ve used a large corpus of news reports related to various firms to gauge the reputational impact of European Commission cartel investigations, or we’ve trained an algorithm to be able to classify US legislative bills and predict whether they have been lobbied or not. Finding a way to collect and convert this unstructured data into a format that is relevant and useful for users is not a trivial challenge, but is one of the most exciting parts of our Economy Data Observatory plans (see related &lt;a href="https://economy.dataobservatory.eu/post/2021-06-02-data-curator-peter-ormosi/" target="_blank" rel="noopener">project plan&lt;/a>).&lt;/p>
&lt;figure id="figure-finding-a-way-to-collect-and-convert-this-unstructured-data-into-a-format-that-is-relevant-and-useful-for-users-is-not-a-trivial-challenge-but-is-one-of-the-most-exciting-parts-of-our-economy-data-observatory-plans">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Finding a way to collect and convert this unstructured data into a format that is relevant and useful for users is not a trivial challenge, but is one of the most exciting parts of our Economy Data Observatory plans." srcset="
/media/img/blogposts_2021/lobbying_activity_huaca4ce6dfe71dde6ecb4cb216aeea2cb_73810_114e11f8942bc563c860bdcb866e7252.webp 400w,
/media/img/blogposts_2021/lobbying_activity_huaca4ce6dfe71dde6ecb4cb216aeea2cb_73810_0326263e6a52a089e421bae6c60ad6df.webp 760w,
/media/img/blogposts_2021/lobbying_activity_huaca4ce6dfe71dde6ecb4cb216aeea2cb_73810_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/lobbying_activity_huaca4ce6dfe71dde6ecb4cb216aeea2cb_73810_114e11f8942bc563c860bdcb866e7252.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Finding a way to collect and convert this unstructured data into a format that is relevant and useful for users is not a trivial challenge, but is one of the most exciting parts of our Economy Data Observatory plans.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>What is an idea that you consider will be a game changer for researchers and/or policymakers?&lt;/strong>&lt;/p>
&lt;p>Partly talking in the past tense, the use of data driven approaches, automation in research, and machine learning have been increasingly influential and I think this trend will continue to all areas of social science. 10 years ago, to do machine learning, you had to build your models from scratch, typically requiring a solid understanding of programming and linear algebra. Today, there are readily available deep learning frameworks like TensorFlow, Keras, PyTorch, to design a neural network for your own application. 10 years ago, natural language processing would have only been relevant for a small group of computational linguists. Today we have massive word embedding models trained on an enormous corpus of texts, at the fingertip of any researcher. 10 years ago, the cost of computing power would have made it prohibitive for most researchers to run even relatively shallow neural networks. Today, I can run complex deep learning models on my laptop using cloud computing servers. As a result of these developments, whereas 10 years ago one would have needed a small (or large) research team to explore certain research questions, much of this can now be automated and be done by a single researcher. For researchers without access to large research grants and without the ability to hire a research team, this has truly been an amazing victory for the democratisation of research.&lt;/p>
&lt;figure id="figure-you-can-already-try-out-our-api">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/EDO_API_indicator_table.png" alt="You can already try out our API." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
You can already try out our API.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>Do you have a favorite, or most used open governmental or open science data source? What do you think about it? Could it be improved?&lt;/strong>&lt;/p>
&lt;p>As a competition economist, I tend to need very specific data for each research question I’m working on, which has to be collected from scratch. On the other hand, most works do require us to use data that has already been collected and made available. For example, access to census data has been immensely useful in ensuring that we can control for local demographic features, in papers where local competition plays a role. Census data is made readily available by most governments, but I particularly liked the Australian data, partly because they run a census every 5 years, but also because they have made the data available through a great table making tool.&lt;/p>
&lt;p>&lt;strong>Is there a number that recently surprised you? What was it?&lt;/strong>&lt;/p>
&lt;p>I have these moments of surprise fairly frequently. To give one example from something I&amp;rsquo;m currently working on, looking at the distributional impact of increasing market concentration, we’ve found that low income households experience a larger increase in the petrol retail margin when market concentration increases than high income households. This fits nicely with theoretical works on search in homogeneous costs, i.e. low income households are less good at engaging with the market, and, as a result, if suppliers can price discriminate, they will charge a higher margin to these households.&lt;/p>
&lt;p>The figure below shows our raw data (18 years of petrol station level daily price data from Western Australia) for low and high income areas, and the increase in the margin following an increase in market concentration (vertical dotted line). The left hand side, low income areas, displays a large increase in the margin (when compared to a control group), whereas the right hand side (high income households) experience no change. In our paper of course we build a fairly data intensive quasi experiment for identification of the treatment effect of changing market concentration on the price margin applied to various demographic groups.&lt;/p>
&lt;figure id="figure-surprising-findings-market-concentration-and-margin-changes-for-petrol-stations">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Surprising findings: market concentration and margin changes for petrol stations." srcset="
/media/img/blogposts_2021/retail_margins_hu4f7a420529861d0e923a9b45359f33fd_73580_00c6b301134458376098673460585fff.webp 400w,
/media/img/blogposts_2021/retail_margins_hu4f7a420529861d0e923a9b45359f33fd_73580_d3d458b83caa89066277276d706b9d0c.webp 760w,
/media/img/blogposts_2021/retail_margins_hu4f7a420529861d0e923a9b45359f33fd_73580_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://reprex-next.netlify.app/media/img/blogposts_2021/retail_margins_hu4f7a420529861d0e923a9b45359f33fd_73580_00c6b301134458376098673460585fff.webp"
width="760"
height="565"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Surprising findings: market concentration and margin changes for petrol stations.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>Do you have a good example of really good, or really bad use of data science /data curation?&lt;/strong>&lt;/p>
&lt;p>Out of professional courtesy I really wouldn’t like to mention names from academic research as examples of bad use of data. But there are ample examples from newspaper coverage of data related work, or simply the misuse of data by newspapers. This may be intentional but is often a result of journalists not having the necessary training in using and analysing data.&lt;/p>
&lt;p>When the press finds a piece of academic research interesting, often bad things come out of it. This is often because not all journalists are well equipped to interpret scientific findings. As a result, sometimes conclusions are drawn as a result of a misinterpretation of good data analysis. Correlation interpreted as causation is a frequently recurring example. Equally bad is press coverage changes the incentive system of producing good research, when scientists work too hard for their work to be noticed by the press, and sacrifice scientific rigour in data analysis for the sake of media attention. There can also be less discernible but equally damaging errors.&lt;/p>
&lt;p>In some cases requiring to pre disclose the tests the research is going to run on data helps maintain credibility in many instances. Moreover, I am always a bit suspicious if the authors do not give access to their data for reproduction.&lt;/p>
&lt;figure id="figure-our-economy-data-observatory-places-all-new-indicators-on-zenodo-with-a-doi-and-asks-future-individual-contributors-their-data-for-replication-there">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/edo_and_zenodo.jpg" alt="Our Economy Data Observatory places all new indicators on Zenodo with a DOI, and asks future individual contributors their data for replication there." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our Economy Data Observatory places all new indicators on Zenodo with a DOI, and asks future individual contributors their data for replication there.
&lt;/figcaption>&lt;/figure>
&lt;p>&lt;strong>What do you see as the greatest challenge with open data in 2021?&lt;/strong>&lt;/p>
&lt;p>The things I mentioned above about the democratisation of research driven by automation and access to big data does raise serious challenges as well. The obvious one is to do with the fact that there are enormous economies of scale in the use of data. As such, larger players will always be better positioned to outdo their smaller competitors, simply as a result of their superior data and infrastructure (for example having more granular consumer data allowing them to offer better designed customised experience for the consumer). Like many others, I see this as the biggest challenge for open data - to level the playing field for smaller players. This is not a trivial task at all; and even if, miraculously, small businesses could access the same data as the biggest players, they still would not have the capacity or the ability to use this data. So allowing access to data alone is unlikely to solve any of these problems. I would say that fostering engagement with open data is probably as big a challenge as creating the open data in the first place.&lt;/p>
&lt;p>&lt;strong>How do you envision the Economy Data Observatory making open data more credible in the European economic policy community and accepted as verified information?&lt;/strong>&lt;/p>
&lt;p>I think starting with a focused agenda is a good idea. For example, linking up with the Centre for Competition Policy means that we have an initial focus of competition policy relevant economic data. This is still a large domain, but it is one where we have ample expertise. Starting with specific research questions such as linking competition enforcement and merger decisions to related information on innovation and ownership data puts the Economy Data Observatory at the heart of some of the most topical policy questions, such as the role of killer acquisitions (acquisitions with the intent to kill of sources of rival innovation), or common ownership, both of which are increasingly discussed in policy and practitioner circles. Once we established ourselves as a credible source of data in the competition policy community, we can look into joining this up with other policy areas, and also with our other Data Observatories (&lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Music&lt;/a> and &lt;a href="https://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal&lt;/a>).&lt;/p>
&lt;figure id="figure-join-our-open-collaboration-economy-data-observatory-team-as-a-data-curatorauthorscurator-developerauthorsdeveloper-or-business-developerauthorsteam-or-share-your-data-in-our-public-repository-economy-data-observatory-on-zenodohttpszenodoorgcommunitieseconomy_observatory">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/img/observatory_screenshots/edo_and_zenodo.png" alt="Join our open collaboration Economy Data Observatory team as a [data curator](/authors/curator), [developer](/authors/developer) or [business developer](/authors/team), or share your data in our public repository [Economy Data Observatory on Zenodo](https://zenodo.org/communities/economy_observatory/)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>, or share your data in our public repository &lt;a href="https://zenodo.org/communities/economy_observatory/" target="_blank" rel="noopener">Economy Data Observatory on Zenodo&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;h2 id="join-us">Join us&lt;/h2>
&lt;p>&lt;em>Join our open collaboration Economy Data Observatory team as a &lt;a href="https://reprex-next.netlify.app/authors/curator">data curator&lt;/a>, &lt;a href="https://reprex-next.netlify.app/authors/developer">developer&lt;/a> or &lt;a href="https://reprex-next.netlify.app/authors/team">business developer&lt;/a>. More interested in environmental impact analysis? Try our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our &lt;a href="https://music.dataobservatory.eu/#contributors" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> team!&lt;/em>&lt;/p></description></item><item><title>Reprex is Contesting all Three Challenges of the EU Datathon 2021 Prize</title><link>https://reprex-next.netlify.app/post/2021-05-21-eu-datathon-2021/</link><pubDate>Fri, 21 May 2021 20:00:00 +0000</pubDate><guid>https://reprex-next.netlify.app/post/2021-05-21-eu-datathon-2021/</guid><description>&lt;p>Reprex, a Dutch start-up enterprise formed to utilize open source software and open data, is looking for partners in an agile, open collaboration to win at least one of the three EU Datathon Prizes. We are looking for policy partners, academic partners and a consultancy partner. Our project is based on agile, open collaboration with three types of contributors.&lt;/p>
&lt;p>With our competing prototypes we want to show that we have a research automation technology that can find open data, process it and validate it into high-quality business, policy or scientific indicators, and release it with daily refreshments in a modern API.&lt;/p>
&lt;p>We are looking for institutions to challenge us with their data problems, and sponsors to increase our capacity. Over then next 5 months, we need to find a sustainable business model for a high-quality and open alternative to other public data programs.&lt;/p>
&lt;h2 id="the-eu-datathon-2021-challenge">The EU Datathon 2021 Challenge&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;em>To take part, you should propose the development of an application that links and uses open datasets.&lt;/em> - our &lt;a href="https://music.dataobservatory.eu/#contributors" target="_blank" rel="noopener">data curator team&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Your application &amp;hellip; is also expected to find suitable new approaches and solutions to help Europe achieve important goals set by the European Commission through the use of open data.&lt;/em>” - this application is developed by our &lt;a href="https://greendeal.dataobservatory.eu/#contributors" target="_blank" rel="noopener">technology contributors&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Your application should showcase opportunities for concrete business models or social enterprises.&lt;/em> - our &lt;a href="https://economy.dataobservatory.eu/#contributors" target="_blank" rel="noopener">service development team&lt;/a> is working to make this happen!&lt;/p>
&lt;/li>
&lt;li>
&lt;p>We use open source software and open data. The applications are hosted on the cloud resources of &lt;a href="#reprex">Reprex&lt;/a>, an early-stage technology startup currently building a viable, open-source, open-data business model to create reproducible research products.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>We are working together with experts in the domain as curators (check out our guidelines if you want to join: &lt;a href="https://curators.dataobservatory.eu/data-curators.html" target="_blank" rel="noopener">Data Curators: Get Inspired!&lt;/a>).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Our development team works on an open collaboration basis. Our indicator R packages, and our services are developed together with &lt;a href="https://music.dataobservatory.eu/author/ropengov/" target="_blank" rel="noopener">rOpenGov&lt;/a>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="mission-statement">Mission statement&lt;/h2>
&lt;p>We want to win an &lt;a href="https://op.europa.eu/en/web/eudatathon" target="_blank" rel="noopener">EU Datathon prize&lt;/a> by processing the vast, already-available governmental and scientific open data made usable for policy-makers, scientific researchers, and business researcher end-users.&lt;/p>
&lt;p>“&lt;em>To take part, you should propose the development of an application that links and uses open datasets. Your application should showcase opportunities for concrete business models or social enterprises. It is also expected to find suitable new approaches and solutions to help Europe achieve important goals set by the European Commission through the use of open data.&lt;/em>”&lt;/p>
&lt;p>We aim to win at least one first prize in the EU Datathon 2021. We are contesting &lt;strong>all three&lt;/strong> challenges, which are related to the EU’s official strategic policies for the coming decade.&lt;/p>
&lt;h2 id="challenge-1-a-european-grean-deel">Challenge 1: A European Grean Deel&lt;/h2>
&lt;figure id="figure-our-green-deal-data-observatory-connects-socio-economic-and-environmental-data-to-help-understanding-and-combating-climate-change">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/observatory_screenshots/GD_Observatory_opening_page.png" alt="Our Green Deal Data Observatory connects socio-economic and environmental data to help understanding and combating climate change." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our Green Deal Data Observatory connects socio-economic and environmental data to help understanding and combating climate change.
&lt;/figcaption>&lt;/figure>
&lt;p>Challenge 1: &lt;a href="https://ec.europa.eu/info/strategy/priorities-2019-2024/european-green-deal_en" target="_blank" rel="noopener">A European Green Deal&lt;/a>, with a particular focus on the &lt;a href="https://ec.europa.eu/commission/presscorner/detail/en/ip_20_2323" target="_blank" rel="noopener">The European Climate Pact&lt;/a>, the &lt;a href="https://ec.europa.eu/info/food-farming-fisheries/farming/organic-farming/organic-action-plan_en" target="_blank" rel="noopener">Organic Action Plan&lt;/a>, and the &lt;a href="https://ec.europa.eu/commission/presscorner/detail/en/IP_21_111" target="_blank" rel="noopener">New European Bauhaus&lt;/a>, i.e., mitigation strategies.&lt;/p>
&lt;p>Climate change and environmental degradation are an existential threat to Europe and the world. To overcome these challenges, the European Union created the European Green Deal strategic plan, which aims to make the EU’s economy sustainable by turning climate and environmental challenges into opportunities and making the transition just and inclusive for all.&lt;/p>
&lt;p>Our &lt;a href="http://greendeal.dataobservatory.eu/" target="_blank" rel="noopener">Green Deal Data Observatory&lt;/a> is a modern reimagination of existing ‘data observatories’; currently, there are over 70 permanent international data collection and dissemination points. One of our objectives is to understand why the dozens of the EU’s observatories do not use open data and reproducible research. We want to show that open governmental data, open science, and reproducible research can lead to a higher quality and faster data ecosystem that fosters growth for policy, business, and academic data users.&lt;/p>
&lt;p>We provide high quality, tidy data through a modern API which enables data flows between public and proprietary databases. We believe that introducing Open Policy Analysis standards with open data, open-source software, and research automation, can help the Green Deal policymaking process. Our collaboration is open for individuals, citizens scientists, research institutes, NGOS, and companies.&lt;/p>
&lt;h2 id="challenge-2-a-europe-fit-for-the-digital-age">Challenge 2: A Europe fit for the digital age&lt;/h2>
&lt;figure id="figure-our-economy-data-observatory-will-focus-on-competition-small-and-medium-sized-enterprizes-and-robotization">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/observatory_screenshots/edo_opening_page.jpg" alt="Our Economy Data Observatory will focus on competition, small and medium sized enterprizes and robotization." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our Economy Data Observatory will focus on competition, small and medium sized enterprizes and robotization.
&lt;/figcaption>&lt;/figure>
&lt;p>Challenge 2: &lt;a href="https://ec.europa.eu/info/strategy/priorities-2019-2024/economy-works-people_en#:~:text=Individuals%20and%20businesses%20in%20the,needs%20of%20the%20EU%27s%20citizens." target="_blank" rel="noopener">An economy that works for people&lt;/a>, with a particular focus on the &lt;a href="https://ec.europa.eu/info/strategy/priorities-2019-2024/economy-works-people/internal-market_en" target="_blank" rel="noopener">Single market strategy&lt;/a>, and particular attention to the strategy’s goals of 1. Modernising our standards system, 2. Consolidating Europe’s intellectual property framework, and 3. Enabling the balanced development of the collaborative economy strategic goals.&lt;/p>
&lt;p>Big data and automation create new inequalities and injustices and have the potential to create a jobless growth economy. Our &lt;a href="https://economy.dataobservatory.eu/" target="_blank" rel="noopener">Economy Data Observatory&lt;/a> is a fully automated, open source, open data observatory that produces new indicators from open data sources and experimental big data sources, with authoritative copies and a modern API.&lt;/p>
&lt;p>Our observatory monitors the European economy to protect consumers and small companies from unfair competition, both from data and knowledge monopolization and robotization. We take a critical Small and Medium-Sized Enterprises (SME)-, intellectual property, and competition policy point of view of automation, robotization, and the AI revolution on the service-oriented European social market economy.&lt;/p>
&lt;p>We would like to create early-warning, risk, economic effect, and impact indicators that can be used in scientific, business, and policy contexts for professionals who are working on re-setting the European economy after a devastating pandemic in the age of AI. We are particularly interested in designing indicators that can be early warnings for killer acquisitions, algorithmic and offline discrimination against consumers based on nationality or place of residence, and signs of undermining key economic and competition policy goals. Our goal is to help small and medium-sized enterprises and start-ups to grow, and to furnish data that encourages the financial sector to provide loans and equity funds for their growth.&lt;/p>
&lt;h2 id="challenge-3-a-europe-fit-for-the-digital-age">Challenge 3: A Europe fit for the digital age&lt;/h2>
&lt;figure id="figure-our-digital-music-observatory-is-not-only-a-demo-of-the-european-music-observatory-but-a-testing-ground-for-data-governance-digital-servcies-act-and-trustworthy-ai-problems">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/observatory_screenshots/dmo_opening_screen.png" alt="Our Digital Music Observatory is not only a demo of the European Music Observatory, but a testing ground for data governance, Digital Servcies Act, and trustworthy AI problems." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our Digital Music Observatory is not only a demo of the European Music Observatory, but a testing ground for data governance, Digital Servcies Act, and trustworthy AI problems.
&lt;/figcaption>&lt;/figure>
&lt;p>Challenge 3: &lt;a href="https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age_en" target="_blank" rel="noopener">A Europe fit for the digital age&lt;/a>, with a particular focus &lt;a href="https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/excellence-trust-artificial-intelligence_en" target="_blank" rel="noopener">Artificial Intelligence&lt;/a>, the &lt;a href="https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/european-data-strategy_en" target="_blank" rel="noopener">European Data Strategy&lt;/a>, the &lt;a href="https://ec.europa.eu/info/strategy/priorities-2019-2024/europe-fit-digital-age/digital-services-act-ensuring-safe-and-accountable-online-environment_en" target="_blank" rel="noopener">Digital Services Act&lt;/a>, &lt;a href="https://digital-strategy.ec.europa.eu/en/policies/digital-skills-and-jobs" target="_blank" rel="noopener">Digital Skills&lt;/a> and &lt;a href="https://digital-strategy.ec.europa.eu/en/policies/connectivity" target="_blank" rel="noopener">Connectivity&lt;/a>.&lt;/p>
&lt;p>The &lt;a href="https://music.dataobservatory.eu/" target="_blank" rel="noopener">Digital Music Observatory&lt;/a> (DMO) is a fully automated, open source, open data observatory that creates public datasets to provide a comprehensive view of the European music industry. It provides high-quality and timely indicators in all four pillars of the planned official European Music Observatory as a modern, open source and largely open data-based, automated, API-supported alternative solution for this planned observatory. The insight and methodologies we are refining in the DMO are applicable and transferable to about 60 other data observatories funded by the EU which do not currently employ governmental or scientific open data.&lt;/p>
&lt;p>Music is one of the most data-driven service industries where most sales are currently executed by AI-driven autonomous systems that influence market shares and intellectual property remuneration. We provide a template that enables making these AI-driven systems accountable and trustworthy, with the goal of re-balancing the legitimate interests of creators, distributors, and consumers. Within Europe, this new balance will be an important use case of the European Data Strategy and the Digital Services Act.&lt;/p>
&lt;p>The DMO is a fully functional service that can serve as a testing ground of the European Data Strategy. It can showcase the ways in which the music industry is affected by the problems that the Digital Services Act and European Trustworthy AI initiatives attempt to regulate. It is being built in open collaboration with national music stakeholders, NGOs, academic institutions, and industry groups.&lt;/p>
&lt;p>Our Product/Market Fit was validated in the world’s 2nd ranked university-backed incubator program, the &lt;a href="https://music.dataobservatory.eu/post/2020-09-25-yesdelft-validation/" target="_blank" rel="noopener">Yes!Delft AI Validation Lab&lt;/a>. We are currently developing this project with the help of the &lt;a href="https://www.jumpmusic.eu/fellow2021/automated-music-observatory/" target="_blank" rel="noopener">JUMP European Music Market Accelerator&lt;/a> program.&lt;/p>
&lt;h2 id="problem-statement">Problem Statement&lt;/h2>
&lt;p>The EU has an 18-year-old open data regime and it makes public taxpayer-funded data in the values of tens of billions of euros per year; the Eurostat program alone handles 20,000 international data products, including at least 5,000 pan-European environmental indicators.&lt;/p>
&lt;p>As open science principles gain increased acceptance, scientific researchers are making hundreds of thousands of valuable datasets public and available for replication every year.&lt;/p>
&lt;p>The EU, the OECD, and UN institutions run around 100 data collection programs, so-called ‘data observatories’ that more or less avoid touching this data, and buy proprietary data instead. Annually, each observatory spends between 50 thousand and 3 million EUR on collecting untidy and proprietary data of inconsistent quality, while never even considering open data.&lt;/p>
&lt;figure id="figure-our-automated-data-observatories-are-modern-reimaginations-of-the-existing-observatories-that-do-not-use-open-data-and-research-automation">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://reprex-next.netlify.app/media/img/observatory_screenshots/observatory_collage_16x9_800.png" alt="Our automated data observatories are modern reimaginations of the existing observatories that do not use open data and research automation." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption data-pre="Figure&amp;nbsp;" data-post=":&amp;nbsp;" class="numbered">
Our automated data observatories are modern reimaginations of the existing observatories that do not use open data and research automation.
&lt;/figcaption>&lt;/figure>
&lt;p>The problem with the current EU data strategy is that while it produces enormous quantities of valuable open data, in the absence of common basic data science and documentation principles, it seems often cheaper to create new data than to put the existing open data into shape.&lt;/p>
&lt;p>This is an absolute waste of resources and efforts. With a few R packages and our deep understanding of advanced data science techniques, we can create valuable datasets from unprocessed open data. In most domains, we are able to repurpose data originally created for other purposes at a historical cost of several billions of euros, converting these unused data assets into valuable datasets that can replace tens of millions’ worth of proprietary data.&lt;/p>
&lt;p>What we want to achieve with this project – and we believe such an accomplishment would merit one of the first prizes - is to add value to a significant portion of pre-existing EU open data (for example, available on &lt;a href="https://data.europa.eu/data/" target="_blank" rel="noopener">data.europa.eu/data&lt;/a>) by re-processing and integrating them into a modern, tidy database with an API access, and to find a business model that emphasises a triangular use of data in 1. business, 2. science and 3. policy-making. Our mission is to modernize the concept of &lt;code>data observatories.&lt;/code>&lt;/p></description></item></channel></rss>