# What Is GPT-3? Everything Your Business Needs to Know About OpenAI’s Breakthrough AI Language Program | ZDNet ![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article1.be68295a7e40.png) URL:: https://www.zdnet.com/article/what-is-gpt-3-everything-business-needs-to-know-about-openais-breakthrough-ai-language-program/ Author:: Tiernan Ray ## Highlights > GPT-3 is a computer program created by the privately held San Francisco startup OpenAI. It is a gigantic neural network, and as such, it is part of the deep learning segment of machine learning, which is itself a branch of the field of computer science known as artificial intelligence, or AI. The program is better than any prior program at producing lines of text that sound like they could have been written by a human. > GPT-3 is compute-hungry, putting it beyond the use of most companies in any conceivable on-premise fashion. Its generated text can be impressive at first blush, but long compositions tend to become somewhat senseless. And it has great potential for amplifying biases, including racism and sexism. > The name GPT-3 is an acronym that stands for "generative pre-training," of which this is the third version so far. It's generative because unlike other neural networks that spit out a numeric score or a yes or no answer, GPT-3 can generate long sequences of original text as its output. It is pre-trained in the sense that is has not been built with any domain knowledge, even though it can complete domain-specific tasks, such as foreign-language translation. > A language model, in the case of GPT-3, is a program that calculates how likely one word is to appear in a text given the other words in the text. That is what is known as the conditional probability of words. > When the neural network is being developed, called the training phase, GPT-3 is fed millions and millions of samples of text and it converts words into what are called vectors, numeric representations. > then it can predict what words come next when it is prompted by a person typing an initial word or words. That action of prediction is known in machine learning as inference. > GPT-3's ability to respond in a way consistent with an example task, including forms to which it was never exposed before, makes it what is called a "few-shot" language model. Instead of being extensively tuned, or "trained," as it's called, on a given task, GPT-3 has so much information already about the many ways that words combine that it can be given only a handful of examples of a task, what's called a fine-tuning step, and it gains the ability to also perform that new task. > it initially would not release to the public the most-capable version, saying it was too dangerous to release into the wild because of the risk of mass-production of false and misleading text. OpenAI has subsequently made it available for download. > This time around, OpenAI is not providing any downloads. Instead, it has turned on a cloud-based API endpoint, making GPT-3 an as-a-service offering. (Think of it as LMaaS, language-model-as-a-service.) The reason, claims OpenAI, is both to limit GPT-3's use by bad actors and to make money. > Game maker Latitude is using GPT-3 to enhance its text-based adventure game, AI Dungeon. Usually, an adventure game would require a complex decision tree to script many possible paths through the game. Instead, GPT-3 can dynamically generate a changing state of gameplay in response to users' typed actions. > Already, task automation is going beyond natural language to generating computer code. Code is a language, and GPT-3 can infer the most likely syntax of operators and operands in different programming languages, and it can produce sequences that can be successfully compiled and run. > An early example lit up the Twitter-verse, from app development startup Debuild. The company's chief, Sharif Shameem, was able to construct a program where you type your description of a software UI in plain English, and GPT-3 responds with computer code using the JSX syntax extension to JavaScript. That code produces a UI matching what you've described. > This is mind blowing. > With GPT-3, I built a layout generator where you just describe any layout you want, and it generates the JSX code for you. > W H A T pic.twitter.com/w8JkrZO4lk > — Sharif Shameem (@sharifshameem) July 13, 2020 > Shameem showed that by describing a UI with multiple buttons, with a single sentence he could describe an entire program, albeit a simple one such as computing basic arithmetic and displaying the result, and GPT-3 would produce all the code for it and display the running app. > I just built a *functioning* React app by describing what I wanted to GPT-3. > I'm still in awe. pic.twitter.com/UUKSYz2NJO > — Sharif Shameem (@sharifshameem) July 17, 2020 > The first advance was the use of what's known as attention. > Every sentence was crammed into the same-sized vector, no matter how long the sentence. > Bengio and his team concluded that this rigid approach was a bottleneck. A language model should be able to search across many vectors of different lengths to find the words that optimize the conditional probability. And so they devised a way to let the neural net flexibly compress words into vectors of different sizes, as well as to allow the program to flexibly search across those vectors for the context that would matter. They called this attention. > another innovation that arrived in 2015 and that was even more central to OpenAI's work, known as unsupervised learning. > The focus up until that time for most language models had been supervised learning with what is known as labeled data. Given an input, a neural net is also given an example output as the objective version of the answer. > But having the desired output carefully labeled can be a problem because it requires lots of curation of data, such as assembling example sentence pairs by human judgment, which is time-consuming and resource-intensive. > Instead of being given a sentence pair, the network was given only single sentences and had to compress each one to a vector and decompress each one back to the original sentence. > In 2018, the OpenAI team combined these two elements, the attention mechanism that Bengio and colleagues developed, which would roam across many word vectors, and the unsupervised pre-training approach of Dai and Le that would gobble large amounts of text, compress it and decompress it to reproduce the original text. > HOW DOES GPT-3 DEPEND ON COMPUTE POWER? > With the arrival of GPT-1, 2, and 3, the scale of computing has become an essential ingredient for progress. The models use more and more computer power when they are being trained to achieve better results. > What optimizes a neural net during training is the adjustment of its weights. The weights, which are also referred to as parameters, are matrices, arrays of rows and columns by which each vector is multiplied. Through multiplication, the many vectors of words, or word fragments, are given greater or lesser weighting in the final output as the neural network is tuned to close the error gap. > OpenAI found that to do well on their increasingly large datasets, they had to add more and more weights. > It's that kind of enormous power requirement that is propelling the field of computer chips. It has driven up the share price of Nvidia, the dominant GPU supplier for AI training, by almost 5,000% over the past ten years. > The company described the total compute cycles required, stating that it is the equivalent of running one thousand trillion floating-point operations per second per day for 3,640 days. > Computer maker and cloud operator Lambda Computing has estimated that it would take a single GPU 355 years to run that much compute, which, at a standard cloud GPU instance price, would cost $4.6 million. And then there's the memory. To hold all the weight values requires more and more memory as parameters grow in number. GPT-3's 175 billion parameters require 700GB, 10 times more than the memory on a single GPU. - Note: Where in the Hypernet can this be stored? Only place would be in people themselves - a neural network made of real people. > OpenAI has produced its own research on the soaring computer power needed. The firm noted back in 2018 that computing cycles consumed by the largest AI training models have been doubling every 3.4 months since 2012, a faster rate of expansion than was the case for the famous Moore's Law of chip transistor growth. (Mind you, the company also has produced research showing that on a unit basis, the ever-larger models end up being more efficient than prior neural nets that did the same work.) > Already, models are under development that use more than a trillion parameters, > GPT-3 samples still sometimes repeat themselves semantically at the document level, start to lose coherence over sufficiently long passages," > Specifically, GPT-3 has difficulty with questions of the type 'If I put cheese into the fridge, will it melt?' write the authors, describing the kind of common sense things that elude GPT-3. > GPT-3's failed attempts at dad jokes. > While GPT-3 can answer supposed common-sense questions, such as how many eyes a giraffe has, it cannot deflect a nonsense question and is led into offering a nonsense answer. Asked, "How many eyes does my foot have?," it will dutifully reply, "My foot has two eyes." - Note: [[Satina]] could maybe recognize jokes or give answers to nonsensical questions that are illogical but stylistically inferrable to humans but not machines. Maybe [[Indy]] puts in things that are unexpected, either jokes or abrupt changes in languages. > getting good output from GPT-3 to some extent requires an investment in creating effective prompts. Some human-devised prompts will coax the program to better results than some other prompts. It's a new version of the adage "garbage in, garbage out." Prompts look like they may become a new domain of programming unto themselves, requiring both savvy and artfulness. > Bias is a big consideration, not only with GPT-3 but with all programs that are relying on conditional distribution. The underlying approach of the program is to give back exactly what's put into it, like a mirror. That has the potential for replicating biases in the data. There has already been a scholarly discussion of extensive bias in GPT-2. > OpenAI told ZDNet it is using a familiar kind of white hat, black hat wargaming to detect dangers in the program: > We've deployed what we call a 'red team' that is tasked with constantly breaking the content filtration system so we can learn more about how and why the model returns bad outputs. Its counterpart is the "blue team" that is tasked with measuring and reducing bias. > Another big issue is the very broad, lowest-common-denominator nature of GPT-3, the fact that it reinforces only the fattest part of a curve of conditional probability. There is what's known as the long tail, and sometimes a fat tail, of a probability distribution. These are less common instances that may constitute the most innovative examples of language use. Focusing on mirroring the most prevalent text in a society risks driving out creativity and exploration. > For the moment, OpenAI's answer to that problem is a setting one can adjust in GPT-3 called a temperature value. Fiddling with this knob will tune GPT-3 to pick less-likely word combinations and so produce text that is perhaps more unusual. > the biggest practical shortcoming is the scale required to train and run GPT-3. > GPT-3 is learning in the sense that its parameter weights are being tuned automatically via ingestion of the training data so that the language model ends up better than its explicit programming alone would afford. > Some might argue that a program that can calculate probabilities across vast assemblages of text may be a different kind of intelligence, perhaps an alien intelligence other than our own. > If it is possible to consider other forms of intelligence, then an emergent property such as the distributed representations that take shape inside neural nets may be one place to look for it. > GPT-3 has opened a new chapter in machine learning. Its most striking feature is its generality. --- Title: What Is GPT-3? Everything Your Business Needs to Know About OpenAI’s Breakthrough AI Language Program | ZDNet Author: Tiernan Ray Tags: readwise, articles date: 2024-01-30 --- # What Is GPT-3? Everything Your Business Needs to Know About OpenAI’s Breakthrough AI Language Program | ZDNet ![rw-book-cover](https://readwise-assets.s3.amazonaws.com/static/images/article1.be68295a7e40.png) URL:: https://www.zdnet.com/article/what-is-gpt-3-everything-business-needs-to-know-about-openais-breakthrough-ai-language-program/ Author:: Tiernan Ray ## AI-Generated Summary None ## Highlights > GPT-3 is a computer program created by the privately held San Francisco startup OpenAI. It is a gigantic neural network, and as such, it is part of the deep learning segment of machine learning, which is itself a branch of the field of computer science known as artificial intelligence, or AI. The program is better than any prior program at producing lines of text that sound like they could have been written by a human. > GPT-3 is compute-hungry, putting it beyond the use of most companies in any conceivable on-premise fashion. Its generated text can be impressive at first blush, but long compositions tend to become somewhat senseless. And it has great potential for amplifying biases, including racism and sexism. > The name GPT-3 is an acronym that stands for "generative pre-training," of which this is the third version so far. It's generative because unlike other neural networks that spit out a numeric score or a yes or no answer, GPT-3 can generate long sequences of original text as its output. It is pre-trained in the sense that is has not been built with any domain knowledge, even though it can complete domain-specific tasks, such as foreign-language translation. > A language model, in the case of GPT-3, is a program that calculates how likely one word is to appear in a text given the other words in the text. That is what is known as the conditional probability of words. > When the neural network is being developed, called the training phase, GPT-3 is fed millions and millions of samples of text and it converts words into what are called vectors, numeric representations. > then it can predict what words come next when it is prompted by a person typing an initial word or words. That action of prediction is known in machine learning as inference. > GPT-3's ability to respond in a way consistent with an example task, including forms to which it was never exposed before, makes it what is called a "few-shot" language model. Instead of being extensively tuned, or "trained," as it's called, on a given task, GPT-3 has so much information already about the many ways that words combine that it can be given only a handful of examples of a task, what's called a fine-tuning step, and it gains the ability to also perform that new task. > it initially would not release to the public the most-capable version, saying it was too dangerous to release into the wild because of the risk of mass-production of false and misleading text. OpenAI has subsequently made it available for download. > This time around, OpenAI is not providing any downloads. Instead, it has turned on a cloud-based API endpoint, making GPT-3 an as-a-service offering. (Think of it as LMaaS, language-model-as-a-service.) The reason, claims OpenAI, is both to limit GPT-3's use by bad actors and to make money. > Game maker Latitude is using GPT-3 to enhance its text-based adventure game, AI Dungeon. Usually, an adventure game would require a complex decision tree to script many possible paths through the game. Instead, GPT-3 can dynamically generate a changing state of gameplay in response to users' typed actions. > Already, task automation is going beyond natural language to generating computer code. Code is a language, and GPT-3 can infer the most likely syntax of operators and operands in different programming languages, and it can produce sequences that can be successfully compiled and run. > An early example lit up the Twitter-verse, from app development startup Debuild. The company's chief, Sharif Shameem, was able to construct a program where you type your description of a software UI in plain English, and GPT-3 responds with computer code using the JSX syntax extension to JavaScript. That code produces a UI matching what you've described. > This is mind blowing. > With GPT-3, I built a layout generator where you just describe any layout you want, and it generates the JSX code for you. > W H A T pic.twitter.com/w8JkrZO4lk > — Sharif Shameem (@sharifshameem) July 13, 2020 > Shameem showed that by describing a UI with multiple buttons, with a single sentence he could describe an entire program, albeit a simple one such as computing basic arithmetic and displaying the result, and GPT-3 would produce all the code for it and display the running app. > I just built a *functioning* React app by describing what I wanted to GPT-3. > I'm still in awe. pic.twitter.com/UUKSYz2NJO > — Sharif Shameem (@sharifshameem) July 17, 2020 > The first advance was the use of what's known as attention. > Every sentence was crammed into the same-sized vector, no matter how long the sentence. > Bengio and his team concluded that this rigid approach was a bottleneck. A language model should be able to search across many vectors of different lengths to find the words that optimize the conditional probability. And so they devised a way to let the neural net flexibly compress words into vectors of different sizes, as well as to allow the program to flexibly search across those vectors for the context that would matter. They called this attention. > another innovation that arrived in 2015 and that was even more central to OpenAI's work, known as unsupervised learning. > The focus up until that time for most language models had been supervised learning with what is known as labeled data. Given an input, a neural net is also given an example output as the objective version of the answer. > But having the desired output carefully labeled can be a problem because it requires lots of curation of data, such as assembling example sentence pairs by human judgment, which is time-consuming and resource-intensive. > Instead of being given a sentence pair, the network was given only single sentences and had to compress each one to a vector and decompress each one back to the original sentence. > In 2018, the OpenAI team combined these two elements, the attention mechanism that Bengio and colleagues developed, which would roam across many word vectors, and the unsupervised pre-training approach of Dai and Le that would gobble large amounts of text, compress it and decompress it to reproduce the original text. > HOW DOES GPT-3 DEPEND ON COMPUTE POWER? > With the arrival of GPT-1, 2, and 3, the scale of computing has become an essential ingredient for progress. The models use more and more computer power when they are being trained to achieve better results. > What optimizes a neural net during training is the adjustment of its weights. The weights, which are also referred to as parameters, are matrices, arrays of rows and columns by which each vector is multiplied. Through multiplication, the many vectors of words, or word fragments, are given greater or lesser weighting in the final output as the neural network is tuned to close the error gap. > OpenAI found that to do well on their increasingly large datasets, they had to add more and more weights. > It's that kind of enormous power requirement that is propelling the field of computer chips. It has driven up the share price of Nvidia, the dominant GPU supplier for AI training, by almost 5,000% over the past ten years. > The company described the total compute cycles required, stating that it is the equivalent of running one thousand trillion floating-point operations per second per day for 3,640 days. > Computer maker and cloud operator Lambda Computing has estimated that it would take a single GPU 355 years to run that much compute, which, at a standard cloud GPU instance price, would cost $4.6 million. And then there's the memory. To hold all the weight values requires more and more memory as parameters grow in number. GPT-3's 175 billion parameters require 700GB, 10 times more than the memory on a single GPU. Note: Where in the Hypernet can this be stored? Only place would be in people themselves - a neural network made of real people. > OpenAI has produced its own research on the soaring computer power needed. The firm noted back in 2018 that computing cycles consumed by the largest AI training models have been doubling every 3.4 months since 2012, a faster rate of expansion than was the case for the famous Moore's Law of chip transistor growth. (Mind you, the company also has produced research showing that on a unit basis, the ever-larger models end up being more efficient than prior neural nets that did the same work.) > Already, models are under development that use more than a trillion parameters, > GPT-3 samples still sometimes repeat themselves semantically at the document level, start to lose coherence over sufficiently long passages," > Specifically, GPT-3 has difficulty with questions of the type 'If I put cheese into the fridge, will it melt?' write the authors, describing the kind of common sense things that elude GPT-3. > GPT-3's failed attempts at dad jokes. > While GPT-3 can answer supposed common-sense questions, such as how many eyes a giraffe has, it cannot deflect a nonsense question and is led into offering a nonsense answer. Asked, "How many eyes does my foot have?," it will dutifully reply, "My foot has two eyes." Note: [[Satina]] could maybe recognize jokes or give answers to nonsensical questions that are illogical but stylistically inferrable to humans but not machines. Maybe [[Indy]] puts in things that are unexpected, either jokes or abrupt changes in languages. > getting good output from GPT-3 to some extent requires an investment in creating effective prompts. Some human-devised prompts will coax the program to better results than some other prompts. It's a new version of the adage "garbage in, garbage out." Prompts look like they may become a new domain of programming unto themselves, requiring both savvy and artfulness. > Bias is a big consideration, not only with GPT-3 but with all programs that are relying on conditional distribution. The underlying approach of the program is to give back exactly what's put into it, like a mirror. That has the potential for replicating biases in the data. There has already been a scholarly discussion of extensive bias in GPT-2. > OpenAI told ZDNet it is using a familiar kind of white hat, black hat wargaming to detect dangers in the program: > We've deployed what we call a 'red team' that is tasked with constantly breaking the content filtration system so we can learn more about how and why the model returns bad outputs. Its counterpart is the "blue team" that is tasked with measuring and reducing bias. > Another big issue is the very broad, lowest-common-denominator nature of GPT-3, the fact that it reinforces only the fattest part of a curve of conditional probability. There is what's known as the long tail, and sometimes a fat tail, of a probability distribution. These are less common instances that may constitute the most innovative examples of language use. Focusing on mirroring the most prevalent text in a society risks driving out creativity and exploration. > For the moment, OpenAI's answer to that problem is a setting one can adjust in GPT-3 called a temperature value. Fiddling with this knob will tune GPT-3 to pick less-likely word combinations and so produce text that is perhaps more unusual. > the biggest practical shortcoming is the scale required to train and run GPT-3. > GPT-3 is learning in the sense that its parameter weights are being tuned automatically via ingestion of the training data so that the language model ends up better than its explicit programming alone would afford. > Some might argue that a program that can calculate probabilities across vast assemblages of text may be a different kind of intelligence, perhaps an alien intelligence other than our own. > If it is possible to consider other forms of intelligence, then an emergent property such as the distributed representations that take shape inside neural nets may be one place to look for it. > GPT-3 has opened a new chapter in machine learning. Its most striking feature is its generality.