Jesus Rodriguez is the CEO of IntoTheBlock, a market intelligence platform for crypto assets. He has held leadership roles at major technology companies and hedge funds. He is an active investor, speaker, author and guest lecturer at Columbia University.
During the last few days, there has been an explosion of commentary in the crypto community about OpenAI’s new GPT-3 language generator model. Some of the comments express useful curiosity about GPT-3, while others are a bit to the extreme, asserting that the crypto community should be terrified about it.
The interest is somewhat surprising because the GPT models are not exactly new and they have been making headlines in the machine learning community for over a year now. The research behind the first GPT model was published in June 2018, followed by GPT-2 in February 2019 and most recently GPT-3 two months ago.
See also: What Is GPT-3 and Should We Be Terrified?
I think it is unlikely that GPT-3 by itself can have a major impact in the crypto ecosystem. However, the techniques behind GPT-3 represent the biggest advancement in deep learning in the last few years and, consequently, can become incredibly relevant to the analysis of crypto-assets. In this article, I would like to take a few minutes to dive into some of the concepts behind GPT-3 and contextualize it to the crypto world.
GPT-3 is a massively large natural language understanding (NLU) model that uses an astonishing 175 billion parameters to master several language tasks. The size makes GPT-3 the largest NLU model in the world, surpassing Microsoft’s Turing-NLG and its predecessor GPT-2.
GPT-3 is able to perform several language tasks such as machine translation, question answering, language analysis and, of course, text generation. GPT-3 has captured the attention of the media for its ability to generate fake text that is indistinguishable from real.
How is this relevant for crypto? Imagine having the ability to regularly generate fake press releases that move the price of the smaller crypto assets? Sounds like a scary threat, but it is not the most important part of GPT-3.
GPT-3 is a language-based model and, consequently, operates using textual datasets. From the crypto market standpoint, that capability is cool but certainly not that interesting. What we should really be paying attention to are the techniques behind GPT3.
GPT-3 is based on a new deep learning architecture known as transformers. The concept of transformers was originally outlined in the paper “Attention is all you need,” published in 2017 by members of the Google Brain team.
The main innovation of the transformer architecture is the concept of “attention” (hence the title of the paper). Attention is typically used in a type of problem known as Seq2Seq, in which a model processes a sequence of items (words, letters, numbers) and outputs a different sequence. This type of problem is incredibly common in language intelligence scenarios such as text generation, machine translation, question answering and so on.
Every time you see a Seq2Seq scenario, you should associate it with what’s called encoder-decoder architectures. Encoders capture the context of the input sequence and pass it to the decoder, which produces the output sequence. Attention mechanisms address the limitations of traditional neural network architectures by identifying the key aspects of the input that should be “paid attention to.”
Traditional deep learning architectures need constant feedback between encoders and decoders, which makes them highly inefficient.
Think about a machine translation scenario from Spanish to English. Typically, the decoder will translate the Spanish text input into an intermediate representation known as the “imaginary language” that will be used by the decoder to translate it into English. More traditional deep learning architectures need constant feedback between encoders and decoders, which makes them highly inefficient.
Conceptually, attention-mechanisms look at an input sequence and decide at each step what other parts of the sequence are important. For instance, in a machine translation scenario, the attention mechanism would highlight words the decoder “should pay attention to” to perform the translation.
The transformer architecture that powered models like GPT-3 is a traditional encoder-decoder architecture that inserts attention blocks to improve efficiency. The role of that block is to look at the entire input and current outputs and infer dependencies that will help to optimize the production of the final output.
The transformer architecture has produced models that can be trained in massively large datasets and can be parallelized efficiently. Not surprisingly, after the original Google paper, there has been a race to build super large models that master different language tasks. Google’s BERT, Facebook’s RoBERTa, Microsoft’s Turing-NLG and OpenAI GPT-3 are newer examples of these models.
GPT-2 astonished the world by operating using 1.5 billion parameters. That record was smashed by Microsoft’s Turing-NLG, which used 17 billion parameters, only for GPT-3 to use a ridiculous 175 billion parameters. All that happened in a year. Plain and simple: when it comes to transformers, bigger is better.
See also: Ben Goertzel – AI for Everyone: Super-Smart Systems That Reward Data Creators
The first generation of transformer architectures has focused on language tasks. But, companies like Facebook and OpenAI have published recent research adapting transformer models to image classification. You might think that this is just an attempt to generate fake images. But the impact goes way beyond that.
Fake image generation is super important to streamline the training of image classification models in the absence of large labeled datasets. There have been attempts to adapt transformers to financial time series datasets, with the hope they can advance quantitative trading strategies.
Now that we have some context related to transformers and GPT-3, we can revisit the original question. Is GPT-3 really scary for crypto assets?
Sure, the prospect of models that can generate fake news that move crypto markets is nothing to joke about. But I think that, in its current form, GPT-3 does not represent a threat for the crypto space. What is more interesting is the impact that transformer architectures can have in the next generation of crypto intelligence solutions. Here are a few real scenarios to consider:
Trading strategies. Obviously, if transformers are proven to be applicable to financial datasets, they can have a major impact in quant strategies for crypto assets. Deep neural networks in general are opening new frontiers in quantitative trading. From basic machine learning models like linear regression or decision trees, quant funds are now looking at sophisticated deep learning strategies.
Being natively digital, crypto is the perfect asset class for quant strategies. Techniques such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs) have gained popularity in the quant space and seem to work well in crypto. Just like in language analysis, transformers could have an edge over CNNs and RNNs, specifically when comes to focus “attention” in several segments of a dataset (for example, during the March 2020 bitcoin crash) and also operate with massively large volumes of records (e.g. blockchain transactions).
More interesting is the impact that transformer architectures can have in the next generation of crypto intelligence solutions.
Blockchain analysis. Transformers can be adapted to detect patterns in blockchains in more computationally efficient ways than current methods. Part of the magic of transformers is their ability to “focus attention” on specific parts of an input dataset and infer potential outputs. Imagine a scenario in which we are analyzing bitcoin mining transactions or flows to exchanges and try to extrapolate patterns in order book activity. Transformers seem particularly well equipped to attack this task.
Decentralized transformers. There are ongoing efforts to adapt transformer models to decentralized AI architectures like SingularityNet. This type of use case could expand the use of transformers to scenarios we haven’t imagined yet. Until now, transformer models such as GPT-3 has been the privilege of large corporate AI labs that have the data and resources to build and operate such massive neural networks. Decentralized AI offers an alternative, in which the training, execution and monitoring of transformers can occur in decentralized networks that operate based on incentive mechanisms.
Just like other neural network architectures have been able to operate in decentralized infrastructures, it is not crazy to think that soon we will see models like GPT-3 running in decentralized AI platforms like SingularityNet or the Ocean Protocol.
GPT-3 and the transformer architecture represent a major breakthrough in the history of deep learning. In the next few years, we are likely to see transformers influence every major area of deep learning, and the influence is likely to expand into financial markets. Crypto should be a beneficiary of these breakthroughs.
Yes, GPT-3 is impressive, but there is no reason to be terrified. Quite the opposite, we should do the work to adapt these major AI achievements and make crypto the most intelligent asset class in history.