Integrating Datafication with Blockchain and AI

Integrating Datafication with Blockchain and AI
Integrating Datafication with Blockchain and AI

We live in a world where our everyday events and activities have all been compiled into one huge database of big data, and this data is constantly being deployed in several ways. This data has become a huge resource of governments and private concerns.

We are in a process of digitalisation of everything, from our use of data to our DNA, and in the future, our integration of DNA and objects around us. So we need to become deeply aware that we are profoundly intertwined with the data we produce, while using tech platforms in our phones, computers, apps.

This happens to each of our individual self, and paradoxically it happens also to companies, businesses, and financial organisations that manage our data: For businesses, big data a is the new gold rush. For governments, big data encapsulates many aspects of government business.

But what bearing do AI and blockchain have on datafication? How are they related? Can they be used to together or would this be asking too much?

The Blockchain and Big Data

The blockchain is basically a distributed ledger of transaction records. In other words, the blockchain is basically built to store and distribute big data in a format which is transparent, secure and immutable. In essence, the blockchain functions as a gigantic super-database. With this kind of data-storage capacity, the blockchain is therefore perfectly positioned to be the next Central Bank of the Internet and Big Data. This gives the blockchain the enviable position as being the number 1 driver of the change in how big data is perceived and managed.

Image by Dinis Guarda

Identity, financial transactions and the supply chain constitute the three key areas where data finds its heaviest applications. Banking, commerce, government, trade, finance, travel, licensing and various sectors of the economy are all made up of one or all of the these three key areas. As data increases in size and sophistication, the methods of storing such data in a secure manner need to evolve as well. Storing such large amounts of data in the cloud or on other conventional storage platforms is costly and simply impracticable (Fedak, 2018). Data is only going to get larger and more complex. Therefore, a better, more efficient and cheaper means of big data management has become a necessity. The blockchain, also known as Distributed Ledger Technology or DLT, has elements that lend it to such usage.

Security of big data and its immutability are two elements that blockchain can bring to the party. Governments across the world are slowly realizing that the most powerful nations are no longer necessarily those with large military arsenals, but those who have the ability to control data and databases. In 2012, a computer virus known as Stuxnet infected software that controlled centrifuge operations in Iran’s uranium enrichment facilities. As a result, this critical component of Iran’s nuclear program suffered a setback as a result of the attack, which saw malicious code cause more than 1000 centrifuges to spin out of control and self-destruct (Beaumont and Hopkins, 2012). This was the first example of how transmission of malicious data caused a disruption to a country’s critical program.

With governments in the US, China and Russia trading allegations of cyber-attacks and cyber-espionage, improved security of sensitive data for governments is creating a growing demand for the exploration of the blockchain as a viable and cost effective substitute to conventional cyber-security protocols.

Governments are not the only ones concerned about safety of their big data. Businesses and individuals are as well. Business processes that have to do with financial transactions, supply chain management, payroll management and accounting, are all areas where big data is generated and can be handled better using distributed ledger technology. Cost savings and data quality are two desired benefits of the blockchain as far as private sector big data is concerned.

Blockchain application to big data is a no-brainer. Most things required to protect and validate big data lie within the blockchain. For instance, it is possible to create consensus-driven timestamps, perform audits, validate data sources and ensure all-round big data integrity (Epstein, 2017).

AI and Big Data

Artificial Intelligence is synonymous with deep learning and machine learning. Why? This is because AI is unable to reach conclusions and take corresponding actions the way the human brain does. To be able to solve complex problems, AI needs to acquire the power of cognition; the machines that run on AI need to be taught to do this, just the way humans need to go to school to learn certain skills. Teaching machines to learn processes and algorithms so as to become capable of solving complex problems the way the human brain would have done, requires large amounts of pure and uncorrupted data. For AI, more is better. The more data it receives as input, the better the outcomes. As more data has become available via the internet, AI has gotten a lot better than it was in the 80s or 90s. AI and big data are therefore interconnected and will become two tools which possess common goals. Big data therefore constitutes the input, and AI is the output which will serve to make machines more capable of solving problems and accomplishing tasks independently (Patrizio, 2018).

Image by Dinis Guarda

The Interconnectivity of the Blockchain, AI and Big Data

The beauty of the blockchain is that  any data that is put in does not necessarily have to be used in the same way it was received. The blockchain can take in data in one way, and make this data available for usage with multiple applications. This is essentially what datafication is all about; digitizing every component of our lives and finding multiple user applications for such data. The blockchain has therefore transcended the regular definitions and applications of a conventional database, and moved on to become something much more useful. It is a distributed database with an added layer of security in the form of cryptographic encryption, capable of imparting a trust factor to any data stored within it. The scope of applications of data in the context of the DLT can be visualized by taking a look at the use cases that are proposed by companies offering Initial Coin Offerings (ICOs).

We must understand the ultimate goal of these three elements is to ensure that all spheres of everyday life and function actually get better. We have also identified that the output of an intelligent machine is AI, while the input is big data (Patrizio, 2018). But what happens if the inputted data is either defective, or has been deliberately corrupted so as to jeopardize outcomes, as may occur in a hack attack? This is where the blockchain comes in. The blockchain is able to make data immutable because it has a mechanism to weed out corrupt data from the network. Data that is corrupted or altered at one point, cannot be added to the chain as it would be recognized as a mismatch with other data points on the chain. This ability to clean up and structure data prior to integration into the blockchain is something that can be deployed to the machine learning process. By cleaning up the big data and making it available for deep learning, the desired outputs can be guaranteed.

Use Cases of AI, Blockchain and Big Data

We round off this discussion with a few use cases of the application of AI and the blockchain to datafication processes.

1. Better Data Security

One of the greatest challenges of data storage remains that of security. Cyber-security breaches that have resulted in data theft have cost the world billions of dollars. A very recent incident in the healthcare system of the US involved healthcare insurer Anthem, which had about personal data of about 78m clients accessed illegally by the so-called health hackers (Scannell and Chon, 2015). The blockchain works by using several nodes to store blocks of data. This decentralized storage system ensures that any form of failure, attack or breach of one node does not jeopardize the data stored on other nodes. Mathematical encryption of new blocks of data prior to being added to other existing blocks adds to the layer of security provided by the multi-nodal data storage structure. Medical records are not only used by healthcare personnel to diagnose and treat patients; they are also used by health insurance companies to maintain the medical insurance system.

In healthcare blockchain also brings an extra layer of security. This is one area where big data is finding massive application, since this is a sector where security and immutability of data is sacrosanct. Blockchain allows a group of independent actors to share digital assets and information without going through a third party. All the data exchanged between said actors in registered on a distributed ledger. There is no central entity controlling the transactions. Each new information added to the ledger is therefore validated by consensus of the network (Petra, 2018).

Other areas have needs for secure data as well. For instance, how would a nuclear-powered electricity plant function if fed with corrupted data? The results are better not even imagined.

Image by Dinis Guarda
Image by Dinis Guarda

2. Fraud Control

Fraud control has many facets. It could apply to the financial services sector, or in areas such as government payroll/pension management, and even in electoral processes. We already have a use case where the blockchain was used to validate results in a province in Sierra Leone in March 2018. Many countries still have problems with electoral fraud, and as technology improves, electoral fraud is now evolving from physical disruptions of the voting process, to data manipulation. The blockchain can be used to validate the input data (register of eligible voters) and to match this with voter counts and results,, thus making it significantly harder to falsify results. AI can then be added to the process by taking over functions such as collation of results and validation of ballots cast.

3. Control of Illicit Financial Flows

If there is a vulnerability that the blockchain has, it is the anonymity that it confers to financial transactions that are done on the cryptocurrencies built with this technology. This is a loophole that can be exploited by the bad guys to move money around for organized crime, narcotics trafficking or terrorism. Indeed, this is already happening and the law enforcement agents in the US, Australia and UK have made arrests and prosecuted suspects in this regard. The missing link here could be AI. According to blockchain advisor Ali Ayyash, platforms could be built to run on AI; teaching them to track and monitor transactional flows on the blockchain. It could then be possible to establish and recognize transactional patterns used by criminal entities, which can then be used to build appropriate checks against the abuse of the anonymity of cryptocurrency transactions in this manner (Ayyash, 2018).

These are just a sample of some possible use cases, as with the passage of time, newer use cases would emerge and existing ones will be improved upon.


In conclusion, we see that the blockchain, AI and big data, must all be interlinked if we are to get the desired results of making our machines smarter and capable of more cognitive function. Big data supplies the input for the machine learning algorithms, but this data input’s sanctity can be provided by the use of the blockchain. The enhanced quality so produced can then be used in the deep learning process to produce outcomes much faster than humans could have done, and with fewer mistakes.

These three are all part of the chain, and it is increasingly difficult to see one working without the other. Without big data, attempts at machine learning would be restricted. Without the quality of big data being enhanced by the use of the blockchain, machine learning would be deficient and would produce less than artificial intelligence. But with the blockchain and big data producing the correct input, machines can then learn deeply and become more intelligent to produce the output desired for any industry.

Therefore, the way forward would be to enhance big data involvement of AI and the blockchain. The potential gains are enormous


Carson, B., Walsh, P., Romanelli, G. and Zhumaev, A. (2018). Blockchain beyond the hype: What is the strategic business value?. [online] McKinsey & Company. Available at: [Accessed 22 Aug. 2018].

Smyth, D. (2016). Why blockchain? What can it do for big data?. [online] Big Data Made Simple – One source. Many perspectives. Available at: [Accessed 22 Aug. 2018].

Scannell, K. and Chon, G. (2015). Cyber security: Attack of the health hackers. Financial Times. [online] Available at: [Accessed 22 Aug. 2018].

Fedak, V. (2018). Blockchain and Big Data: the match made in heavens – Towards Data Science. [online] Towards Data Science. Available at: [Accessed 22 Aug. 2018].

Beaumont, P. and Hopkins, N. (2012). US was ‘key player in cyber-attacks on Iran’s nuclear programme’. [online] the Guardian. Available at: [Accessed 22 Aug. 2018].

Petra, A. (2018). Epidemiology And Public Health In The Age of Blockchain. [online]
Intelligenthq. Available at: [Accessed 12 Nov. 2018]