How to use pre-trained models in your next business project

Most of the new deep learning models being released, especially in NLP, are very, very large: They have parameters ranging from hundreds of millions to tens of billions.

Given good enough architecture, the larger the model, the more learning capacity it has. Thus, these new models have huge learning capacity and are trained on very, very large datasets.

Because of that, they learn the entire distribution of the datasets they are trained on. One can say that they encode compressed knowledge of these datasets. This allows these models to be used for very interesting applications—the most common one being transfer learning. Transfer learning is fine-tuning pre-trained models on custom datasets/tasks, which requires far less data, and models converge very quickly compared to training from scratch.

Read: [How machines see: everything you need to know about computer vision]

How pre-trained models are the algorithms of the future

Although pre-trained models are also used in computer vision, this article will focus on their cutting-edge use in the natural language processing (NLP) domain. Transformer architecture is the most common and most powerful architecture that is being used in these models.

The Transformer architecture as presented in Google's 2017 paper, "Attention Is All You Need."

Although BERT started the NLP transfer learning revolution, we will explore GPT-2 and T5 models. These models are pre-trained—fine-tuning them on specific applications will result in much better evaluation metrics, but we will be using them out of the box, i.e., with no fine-tuning.

Pre-trained NLP models: OpenAI’s GPT-2

GPT-2 created quite a controversy when it was released back in 2019. Since it was very good at generating text, it attracted quite the media attention and raised a lot of questions regarding the future of AI.

Trained on 40 GB of textual data, GPT-2 is a very large model containing a massive amount of compressed knowledge from a cross-section of the internet.

GPT-2 has a lot of potential use cases. It can be used to predict the probability of a sentence. This, in turn, can be used for text autocorrection. Next, word prediction can be directly used to build an autocomplete component for an IDE (like Visual Studio Code or PyCharm) for writing code as well as general text writing. We will use it for automatic text generation, and a large corpus of text can be used for natural language analysis.

GPT-2 producing suggestions based on textual input. The input on the left is "AI took most of the jobs." The three outputs suggested on the right are "In fact, the company's fist hiring in May," "The only job that was not taken was that of," and "'When I was young, it was like being" without a closing quote mark.

Text generation

The ability of a pre-trained model like GPT-2 to generate coherent text is very impressive. We can give it a prefix text and ask it to generate the next word, phrase, or sentence.

An example use case is generating a product reviews dataset to see which type of words are generally used in positive reviews versus negative reviews.

Let’s look at some examples, starting with what we get if we start with the positive prefix, “Really liked this movie!”

Sample 1 Loved the character’s emotions at being in constant danger, and how his inner fears were slowly overcome by these events. Also loved that he is so focused on surviving; even for a while it felt like something out of Batman v Superman was showing up every now again because you always knew your enemy would show up eventually anyways 🙂 The ending theme really stuck with me too… I mean yeah they did have to…
Sample 2 It did exactly what I wanted it to do, just a lot of fun. The cast was awesome and the story makes sense so far after seeing everything else already on IMDB. There are hints about being in an underground city from time immemorial due their “power” (or lack thereof) but they don’t really tell that much…I think its more like one big mystery then some other aspects coming together with me for now though….It’s supposed…

As you can see, the word review was not anywhere in the prefix, but as most reviews are titles followed by the body of the review, this forced the model to adapt to that distribution. Also notice the reference to Batman v Superman.

Let’s see another example. Instead of a movie review, we’ll try to generate a product review using the negative prefix, “A trash product! Do not buy.”

Sample 1 And please, do NOT try to use this as a fertilizer for your gardens or lawns – it will cause some serious water damage if ingested by fish and birds. Use with caution because of the possibility that pesticides may be present in pet foods available online at organic grocery stores such too “good” names like Petsmart, PetSmart & Safeway.How toxic is food waste? Waste products can contribute significant amounts – enough so that even very…
Sample 2 No one will give you a piece of that garbage ever again. Just do yourself and your business a favor and clean up before buying it for nothing more than to save some money or convenience in the future.1/2 oz Bottle Plastic Food Storage Packaging + 1 pack #1636370105702615 (P, 3ml) $7 Buy Now You can purchase these bottles at any retail stores but here is how to get…

Again, the prefix can be inferred as the title of a product review, so the model starts generating text following that pattern. GPT-2 can generate any type of text like this.

A Google Colab notebook is ready to be used for experiments, as is the “Write With Transformer” live demo.

Question answering

Yes, since GPT-2 is trained on the web, it “knows” a lot of human knowledge that has been published online up till 2019. It can work for contextual questions as well, but we will have to follow the explicit format of “Question: X, Answer:” before letting it attempt to autocomplete. But if we force the model to answer our question, it may output a pretty vague answer. Here’s what happens trying to force it to answer open-ended questions to test its knowledge:

Sample 1 Question: Who invented the theory of evolution?
Answer: The theory of evolution was first proposed by Charles Darwin in 1859.
Sample 2 Question: How many teeth do humans have?
Answer: Humans have 21 teeth.

As we can see, the pre-trained model gave a pretty detailed answer to the first question. For the second, it tried its best, but it does not compare with Google Search.

It’s clear that GPT-2 has huge potential. Fine-tuning it, it can be used for the above-mentioned examples with much higher accuracy. But even the pre-trained GPT-2 we are evaluating is still not that bad.

Pre-trained NLP models: Google’s T5

Google’s T5 is one of the most advanced natural language models to date. It builds on top of previous work on Transformer models in general. Unlike BERT, which had only encoder blocks, and GPT-2, which had only decoder blocks, T5 uses both.

T5 inputs and outputs. 1) "translate English to German: That is good," becomes "Das ist gut." 2) "cola sentence: The course is jumping well," becomes "not acceptable." 3) "stsb sentence1: The rhino grazed on the grass. sentence2: A rhino is grazing in a field," becomes "3.8." 4) "summarize: state authorities dispatched emergency crews tuesday to survey the damage after an onslaught of severe weather in mississippi…" becomes "six people hospitalized after a storm in attala county."