Daniel Antal, CFA | Reprex

Digital Music Observatory on the MaMA Convention 2021

Thu, 14 Oct 2021 11:00:00 +0000

Currently more than half of the global music sales are made by autonomous AI systems owned by Google, Apple, or Spotify. These data monopolies are getting rich, because they reap the profit from music businesses with an average employee count of 1.8 Europe. European music businesses are easy to exploit with armies of data engineers and data scientists because they do not have a single data scientist or even an IT function.

Artists in the UK had a difficulty explaining in Westminster how they are losing out in streaming– so we have created a streaming price index, like the Dow Jones, if you like, that explains the economic factors of the devaluation of music in the last 5 years in 20 countries. (See our report.)
Music organizations in Slovakia and Hungary were frustrated that their politicians and journalists believed music to be taxpayer funded, so we showed with data that they contribute more proportionally to the national budget than car manufacturers, the darling of local politicians (See our reports in Hungary (recast several times) and in Slovakia.)
We successfully challenged with data restaurant associations, hotel chains, telecom corporations and broadcasters who wanted to bring music prices down in court and via lobbying.

The music industry has envied the television and film industry which has a single go-to-point for data when it needs them, the European Audiovisual Observatory. It started lobbying for a publicly financed music observatory. But we did not wait. The music industry has a tragic track record of failed centralized international data projects. We built Reprex out of a 12-country, decentralized music project. We learned how to utilize hidden, but already existing data and research funds well, and how to manage the data governance among the poisonous conflicts of interests between rich and poor countries, authors vs producers, producer’s vs performers.

Our Digital Music Observatory is not theoretical, it is practical, because it is built around real-life court cases, damage claims, lobbying and PR arguments.
Our Digital Music Observatory is comprehensive – it contains more than a thousand indicators from all European countries. We have enough data to test the biases of the Spotify or the YouTube algorithm – you would be surprised what the data tells us.
It has data available much sooner, in much higher quality and in a more practical format than in the Audiovisual one.

Presentation Slides

You can see the presentation slides here.

Crunchconf: Open Data, New Gold Without the Rush

Fri, 08 Oct 2021 10:10:00 +0000

Every year, the EU announces that billions and billions of data are now “open” again, but this is not gold. At least not in the form of nicely minted gold coins, but in gold dust and nuggets found in the muddy banks of chilly rivers. There is no rush for it, because panning out its value requires a lot of hours of hard work. Our goal is to automate this work to make open data usable at scale, even in trustworthy AI solutions.

Summary

In his presentation, Daniel compared the current state of open data (including governmental open data and scientific open data) to a thrift store. You can often find bargains, or historical data that would be impossible to source from data vendors, but on a strictly as-is basis, without a catalogue, service, or guarantee. Therefore, working with open data requires a careful reprocessing, validation, and in many cases, frequent re-validation. Open data is often over-estimated: it is never a finished product, often it cannot even be downloaded, therefore it requires further investment to make it valuable. However, because most open data arrives from the governmental sector, you can tap into information sources where no market alternative exists. Open data in some cases may be a cheaper substitute to market vendors, but often it is an exclusive source of information that do not have any market vendors.

Sisyphus was punished by being forced to roll an immense boulder up a hill only for it to roll down every time it neared the top, repeating this action for eternity. This is the price that project managers and analysts pay for the inadequate documentation of their data assets.

The practices related to the exploitation of open data are not only relevant in an open data context: these are good data ingestion and procurement practices for any third party data, and in large organizations, for any cross-departmental data. (See the blogpost: The Data Sisyphus.)
Case Study: Belgian Drought/Flood Risk Awareness, Financial Capacity & Hydrology a complex integration of various open data sources.

In the second part of the presentation, Daniel talked about our modern data observatory concept. We have reviewed about 80 functioning and already defunct international data collection programs. Data observatories, like Copernicus’ Observatory, are permanent infrastructure to record various domain-specific data, such as alternative fuel information, information on homelessness, or on the European music business. In our assessment, most of the EU, OECD, UNESCO recognized or endorsed observatories use obsolete technology and do not rely on the new achievements of data science. Reprex, our start-up offers an open source, open data based alternative solution to build largely automated data observatories. We believe that human judgement is needed in data curation, but processing, documentation and validation is best done by computers.

Case Study: Reprocessing geographical information with administrative boundary changes

At last, he presented a few development directions with our open-source software, mentioning our work withing the rOpenGov community. This part of the presentation was originally meant to open the way for a half-day open data workshop, but due to the current pandemic situation, the physical part of the conference and the workshops were not held.

The presentation largely included the topics of our Data & Lyrics blogpost: Open Data—The New Gold Without the Rush

Presentation Slides

See the presentation slides here.

Reprex introduction in IVIR

Tue, 02 Feb 2021 10:10:00 +0000

IViRtual 9 April 2021

Product/Market Fit Validation in Yes!Delft

Fri, 25 Sep 2020 15:31:39 +0000

We would like to validate our product market/fit in two segments, business/policy research and scientific research, with a supporting role given to data journalism. Because we want to follow a bootstrapping strategy, we must focus on those clients where we find the highest value proposition, which is of course easier said than done. We see much interest in our offering from other continents, therefore we truly welcome the opportunity that we can do this on a truly global business canvas in one of the worlds’ top five incubators, the number 2 university-backed incubator in the world, second to none in Europe, in the Yes!Delft AI+Blockchain Validation Lab.

In Europe hundreds of thousands of microenterprises, such as record labels, video producers or book publishers are facing data and AI giants like Google’s YouTube, Apple Music, Spotify, Netflix or Amazon. If the recommendation engines of these giants do not recommend their songs, films or books, then their investments are doomed to fail, because about half of the global sales are driven by AI algorithms. When they make a claim for the missing money, they will immediately find themselves in a dispute with gigabytes of data that they can only handle with a data scientist, even though they do not even have an IT professional or an HR professional to make the hire.

An awful lot of money, creativity and real values are at stake, and we want to be on the creator’s side, their technician’s side, their manager’s side when they want to get a fair share from the pie and they want to help these industry leader to make the pie grow.

The UNESCO and the EU have been promoting as an organizational solution the fragmentation problem with the so-called data observatories that are pooling the business, policy, and scientific research needs of various domains, like music. This is an idea that we really like, and we believe that our research automation solutions can help these observatories to grow faster as ecosystems, create better quality and more timely data and research products and a far lower cost.

We define ourselves as a reproducible research company inspired by the philosophy of open collaboration, based on open-source software and open data. We want to explore various revenue models around these ideas.

We are not committed to open source licensing if more permissive licensing policies provide us with better opportunities.
We would like to explore various data-as-service models, because we do not want to be locked into the position of cheap open data vendors.
We want to deploy AI applications that really help earning money in these sectors with playlisting, recommendation engines, forecasting applications, or royalty valuations, because our open collaboration approach brings up enough data sooner to than its alternatives, because it manages inherent conflicts of interests, fragmentation, and decentralization better than hierarchical solutions.

Timeline

In January CEEMID reached its peak: we introduced a 12-country reproducible research project made with only freelancers in Brussels, presented as best use case of evidence-based policy design.
In February Daniel visited the Yes!Delft Co-Lab to find out who would be the best co-founder to re-launch CEEMID as an enterprise.
In April we started to release our data as open data for validation.
One month ago we started-up.
Then we launched the music.dataobservatory.eu project.
A few other data observatories.

Bonus:

Palato in the Hague, where we took our selfie and had an absolutely amazing dinner after the pitch. Check them out!

Reproducible Survey Harmonization: retroharmonize Is Released

Mon, 21 Sep 2020 11:31:39 +0000

Our original intention was to make surveying more accessible for music and creative industry partners, by relying more on already existing survey data, and better designing complementary, smaller surveys, becasue surveying, opinion polling is becoming increasingly expensive in the develop world. People are less and less likely to sit down for an interview in their houses. We have tried to harmonize our custom surveys, particuarly with Kantar in Hungary and Focus in Slovakia with exisiting EU projects. But we ended up making a part of international survey harmonization across countries and throughout years easier to automate.

Surveys are like sensors for natural sciences and industrial production. They are essential for almost any social and economic statistical indicator, for calculating the inflation, parts of the GDP, participation in education programs. Making surveys easier to harmonize and exploit more already existing survey data can bring down research cost, and can increase research value at the same time. (See our earlier blog post Increase The Value Of Market Research With Open Data And Survey Harmonization.)

So, if you are an R user, you can use install.packages(“retroharmonize”) to get the released 0.1.13 version and make tutorials with real Eurobarometer or Afrobarometer microdata. With devtools::install_github("antaldaniel/retroharmonize") you can already install the current development version 0.1.14, which handles perl-like regex, which will be necessary for our next tutorial in the making for Arab Barometer.

Related:

Starting-up

Mon, 24 Aug 2020 10:15:00 +0000

The big day has come: the co-founders singed off the documents at the public notary and started the registration of a reproducible research start-up in Leiden. We got a lot of support from our friends! Your encouragement gives us a lot of energy to accomplish our first milestones, and to get Reprex B.V. going!

Reprex means ‘reproducible example’ in data science. When you are stuck with a problem, creating a reproducible example allows other computer scientists, statisticians, programmers or data users to solve it. In 80% of the cases, you usually find the solution while creating a generalized example. In the 20% other cases, you can reach out for help easily.

In the coming days, we are launching demo versions of our headline products, data observatories. music.dataobservatory.eu will be a fully automated online service that every day collects, processes, cleans, and publishes scientifically valid data about European music. Very soon after we will launch two other observatories.

The creative and cultural sector, NGOs, most research institutions, data journalism teams are usually very small, and they do not have internal IT or data science capacities. We would like to provide them a transparent, high quality, and fully open source solution to acquire data, process it without errors, document it and make sense of it. We would like to embrace the idea of open collaboration among creative enterprises, scientific researchers, NGOs, data journalists and policymakers with our work.

Our work will comply with the Open Policy Analysis standards developed by the Berkeley Initiative for Transparency in the Social Sciences & Center for Effective Global Action and the four principles of reproducible research: reviewability, replicability, confirmability and auditability. We believe that these standards apply in reproducible finance, empirical evidence presentation in courts, or advocating sound policies and producing high-quality journalism.

Do you want to help our start?

We would like to enter into the Validation Lab of one of the best artificial intelligence incubators in early September. Talented team members, letters of intents and assignments from organizations will give a lot of credibility to our start Meet our team ».

Put as in contact with people who love to write code in R and interested in automating business and social science research and primary data collection such as surveying. Check out what sort of code we create »
Introduce us to people who need data and information to make better informed decision and analysis in music, film, book publishing, photography services or socially responsible finance.
Share contacts of data journalists who would like to develop stories from big survey programs like Eurobarometer, Afrobarometer and Lationbarometro, or base their storytelling on data and its visualizations. See our survey harmonization examples »

Do you know such people? Send over this post or connect us in an email or social media message!

Thanks again for your good wishes and encouragements, and hope to hear from you soon!