Photo courtesy of Rodion Kutsaev

Breaking down data & AI silos

June 24, 2019

Data is an essential resource in our digital economy. Unfortunately, we currently live in the era of internet data silos, where data is captured by individual applications and platforms and is only shared sporadically. This status quo has many reasons, but can mostly be attributed to the absence of a global, standardized and trusted infrastructure that facilitates and incentivizes the exchange of data and algorithms between businesses, humans and machines at scale. Here we’ll dive deeper into the functional requirements of a global data exchange layer such as decentralization, data provenance, data pricing. When realized, we could see the true emergence of digital ecosystemswith interoperable services, built-in data monetization and the volume and variety of data to build powerful algorithms.

Our observations

As part of the EU’s strategy to develop a Digital Single Market, it wants to build a data economy. Part of the strategy is the re-use of public data sources and publicly funded data.
The UK published a report named ‘Unlocking Digital Competition’ which proposes a digital markets unit which is responsible for a digital code of conduct, promoting openness, personal data mobility and interoperability of services.
X-Road was initially developed by the Estonian government as a data sharing solution for its e-government services. Now it has been abstracted as a general data sharing solution as available open-source.
The Ocean Project is developing a protocol which orchestrates the exchange of data in a decentralized fashion. At its core are decentralized service agreements and decentralized access control, which execute on decentralized virtual machines. The project envisions that data marketplaces or clearinghouses are being built on top of the Ocean Protocol.
SingularityNET aims to develop a worldwide network that is powered by a decentralized protocol that allows people to create, share and monetize their AI algorithms.
Enigma and IOTA are building decentralized data marketplaces that provide a decentralized and secure data infrastructure on top of which applications can be built
With the emergence of a data exchange layer, new business models will appear that will harvest thenetwork effects that emerge from these open data marketplaces. For instance, Numerai is a crypto-driven hedge fund that crowdsources the predictive modeling for its investment strategies. Similarly,Enigma Catalyst is a platform for data-driven crypto-asset investing and research which allows anyone to develop and share their own investment strategies.

Connecting the dots

Enabling factors like the advent of mobile, user friendly interfaces and the cloud helped with generating enormous amounts of user data. This Cambrian explosion of data combined with rapid progress in storage and computing allowed for machine learning to undergo a renaissance over the past decade. This provided us with better statistical models and algorithms, resulting in more powerful services (e.g. recommender systems, image recognition, voice assistants) which once again leads to increased usage and, hence, even more data.

Unfortunately, most of the data currently sits in business silos. On the one hand this is caused by big platforms that keep “their” data to themselves to keep competitors at bay, on the other hand is caused by the absence of an advanced infrastructure that could facilitate and incentivize the exchange of data and algorithms. Most of thedata exchange that currently takes place happens in narrow APIs which are fully controlled by the applications themselves or by means of ad hoc data dumps via primitive data marketplaces. Consequently, we have still not realized the full potential of our data economy. That is, in the best-case scenario, any dataset, whether from Google or from a small business, could potentially be modelled by any talented data scientists in the world, and can be applied in other contexts and for different purposes and/or can be aggregated and fused with other datasets from other sources at scale.

However, in order for such a situation to be realized, the internet could benefit from the development of a global data– and AI exchange layer with a few important conditions in place. First of all, this data exchange protocol layer preferably needs to have the characteristics of a utility, providing trust, security, openness, privacyand neutrality in a multi-stakeholder environment.

In a previous note, we have discussed how the use of decentralized open-source protocols such as permissionless blockchains could be an important enabler for these prerequisites. In contrast, a centrally managed and owned data exchange platform could lead to distrust as it would introduce a middleman position, which could either become disproportionately powerful and/or could be gamed both internally and externally. Secondly, a data exchange layer requires some form of data provenance/lineage, i.e. keeping track of the origin of a piece of data, the processes that it undergoes, who processed it and where it is going over time. Keeping track of the life cycle of a piece of data is a crucial feature in making the data value chain transparent, but more importantly, treating data as a real commodity with a state, an owner and a value. The presence of data provenance in turn enables other important functionalities like pricing of data units and transaction of data and the creation and real-time verification of more fine grained service agreements in the form of smart contracts.

These functionalities are currently being developed by a plethora of projects, each competing to become a significant part of the solution in the data exchange layer. Several examples of such (complementary) projects are provided in the observations section above. Crucially, these projects have announced that they will collaborate other higher level features in order for them to be interoperable and to prevent the emergence of competing standards that would, once again, lead to the silofication of data.

Implications

We are likely to see another data explosion in the coming decade due to two important causes. The first one is caused by the combined adoption of 5G, autonomous sensor-based systems (e.g. autonomous vehicles, IoT, smart environments) and machine-to-machine communication. The second important cause is an increased availability due to a frictionless data exchange layer. Consequently, we could see another surge in the advancement of AI, similar to the deep learning renaissance in the last decade, as there is more data available to train AI systems on.
When an open global data-exchange layer is realized, we could see increased momentum for data commons and open data. More extremely, as there are no practical reasons worry about data leaks or downright abuse of data, we could expect the emergence of a new moral imperative in which it is self-evident that data that serves the common good should be shared by default.
Big tech companies which have built their empires on the data silo paradigm will have to shift their business models towards a more open, decentralized and non-rent seeking model. Facebook’s introduction of Libra can already be understood in this way; it does not natively possess the data that is created on its infrastructure.
The wide-spread availability of data and algorithms combined with the momentum in open source could further spur open innovation.

Breaking down data & AI silos

Our observations

Connecting the dots

Implications

Series 'AI Metaphors'

About the author(s)

You may also like