It was Sunday the 26th of January 10 pm, I had one day left to submit my article for peer review. I was contemplating. Not knowing what to write about. I was staring at my computer screen. No unfamiliar websites. Just websites I had searched for before. Time flew. But suddenly I realized something. Something peculiar. Something odd. I remembered reading an article a couple of days ago about artificial intelligence (AI). The article explained how organizations including The Washington Post, The New York Times, Reuters, and Associated Press, made extensive use of AI to generate content. I was blown away; the content I was reading on my computer screen was actually written by AI.
New applications for AI are found at an unprecedented rate, faster than ever before. Today, we will focus on language models. In short, language models are machine learning models that look at a part of your sentence and predict the next word. We can already see these models in smartphone keyboards, speech recognition systems, and even Google, in which Google Predictive Search autocompletes your search query. In this article, the focus will be GPT-2, a state-of-the-art language model, far ahead of what is currently commercially available.
What is GPT-2?
Consider the following text. Note that the sentence in bold is the prompt written by a human.
Our writings mirror the societies in which we thrive. Our global cultural landscape has been shaped by many generations before us in written text. The liberal internationalism of the ages has given us a healthy skepticism of nationalism and a commitment to multilateralism and free trade. We are far more sophisticated on human rights than ever before. But these new ideals have not turned us into moral crusaders. Instead, we embrace multiculturalism and respect our differences.
As you may have noticed, the above text is entirely generated by the large-scale unsupervised learning model GPT-2. Unsupervised learning is a machine learning technique in which the model finds previously unknown patterns in a data set without pre-existing labels. This allows GPT-2 to generate coherent paragraphs of text, achieve high scores on many language modeling benchmarks, and perform basic reading comprehension. Additionally, GPT-2 can perform question answering, and summarization, without task-specific training.
How does GPT-2 work?
The core of the GPT-2 model is language modeling. As stated above, language modeling involves predicting the next word. In fact, a language model, is a probability distribution over sequences of words. Given such a sequence, the model assigns a probability to the whole sequence.
Given a human-written prompt, GPT-2 works by, generating a machine-written completion of the prompt. It specifically looks at the first word or first few words and predicts what the next word will be according to the probability distribution. Note that this process is defined recursively, so after predicting the first word, based on the first word PGT-2 predicts the second word, and so on. The best way to explain this is by looking at some simple examples:
The dog on the ship ran off, and the dog was found by the crew. (1)
The motor on the ship ran at a speed of about 100 miles per hour. (2)
Note that in the first example (1) we use the word ‘dog’ and in the second example (2) we use the word ‘motor’. You might be wondering now: how does GPT-2 know the difference in contexts between a ‘dog’ and ‘motor’? Because, the type of running done by a dog, is different from that of a motor. Well, GPT-2 is based on an attention model. An attention model focuses the attention of, in this case GPT-2, on words that are most relevant to predicting the next word in the sentence. Take for example the following attention pattern:
This attention pattern is read from left-to-right. The darker lines show where GPT-2 is paying attention to when guessing the next word. In this case, when we look at the last word of the human-written prompt ‘ran’, the model focuses its attention towards the word ‘dog’. This makes sense since the question is, who or what ‘ran’, to predict the next word.
GPT-2 has exactly 144 distinct attention patterns, all focusing on different attention mechanisms. There is for example, an attention pattern that dedicates its attention solely on the previous word in the sentence. And multiple other ones, which focus only on the first word in the sentence. The latter actually implies that the GPT-2 model hasn’t found the linguistic phenomenon it was looking for.
By taking these steps repetitively and at the same time using unsupervised learning techniques on a 40 GB data set of internet text, GPT-2 is able to generate cohesive machine-written texts based on limited human-written prompts.
Unfortunately, OpenAI, the research laboratory based in San Francisco that developed GPT-2, did not release the trained model. This was due to the fact that the model was able to generate synthetic text samples in response to the model being given an arbitrary input. See for example the text (from the OpenAI website) below. Again, the text in bold is the human-written prompt and the regular text is machine-written.
A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its whereabouts are unknown.
The incident occurred on the downtown train line, which runs from Covington and Ashland stations.
“The theft of this nuclear material will have significant negative consequences on public and environmental health, our workforce and the economy of our nation,” said Tom Hicks, the U.S. Energy Secretary, in a statement. “Our top priority is to secure the theft and ensure it doesn’t happen again.”
The stolen material was taken from the University of Cincinnati’s Research Triangle Park nuclear research site, according to a news release from Department officials.
The Nuclear Regulatory Commission did not immediately release any information.
According to the release, the U.S. Department of Energy’s Office of Nuclear Material Safety and Security is leading that team’s investigation.
“The safety of people, the environment and the nation’s nuclear stockpile is our highest priority,” Hicks said. “We will get to the bottom of this and make no excuses.
You could compare the GPT-2 model to a chameleon, since it adapts to the style and content of the first human-written prompt. This allows users to generate realistic and coherent texts based on the user’s style. Note, however, that these developments are vulnerable to malicious intent. Already, many platforms on the internet including, Twitter, Facebook, and Instagram, are employing a wide variety of techniques to limit the spread of disinformation. With the potential release of advanced language models like GPT-2, disinformation can be produced on massive scale, with the intent to engage in information warfare.
Developments in AI reach an unprecedented rate, faster than ever before. It will not take long before even the content on this website is written by AI. In fact, some paragraphs in this article are written by AI, but with a human-written prompt. In the near future, we might see systems in which articles are fully written by AI, without any human prompts.
This article is partially written by AI. For more information about GPT-2 check out the paper ‘Better Language Models and Their Implications’ by OpenAI. If you want to try out GPT-2, follow this link: https://talktotransformer.com/.
This article is written by Berke Aslan