- Last Updated: [[2020-12-13]] - [[2020-10-01]] - [[Linguistics]] [[Artificial Intelligence]] [[Natural Language Processing]] #[[GPT-3]] - **Difficulty of representing human traits** - Some inherently human components of language learning are difficult to represent computationally. For example, human children mostly learn language through positive reinforcement, not negative, which is unlike how a computer learns. A computer requires clear boundaries of "correct" language. - Humor and nonsense - "`GPT-3's failed attempts at dad jokes. `" - "`While GPT-3 can answer supposed common-sense questions, such as how many eyes a giraffe has, it cannot deflect a nonsense question and is led into offering a nonsense answer. Asked, "How many eyes does my foot have?," it will dutifully reply, "My foot has two eyes." `" - Common sense - "`Specifically, GPT-3 has difficulty with questions of the type 'If I put cheese into the fridge, will it melt?' write the authors, describing the kind of common sense things that elude GPT-3. `" - [[A lack of sample data makes conclusions difficult.]] - Computers require large amounts of linguistic samples to be able to make connections. A sample large enough to be useful is difficult to find in a form that computers can understand (written). - "`One of the most cited English linguistic corpora is the Penn Treebank. Derived from widely-different sources, such as IBM computer manuals and transcribed telephone conversations, this corpus contains over 4.5 million words of American English. `" - **Each language must be represented differently** - The variety of natural languages in syntax, vocabulary, morphology, and semantics makes it difficult to make general observations about language as a whole; instead, new models must be made to describe sometimes drastically differing grammatical structures. - "`Using computational methods, Japanese sentence corpora were analyzed and a pattern of log-normality was found in relation to sentence length. Though the exact cause of this lognormality remains unknown, it is precisely this sort of information which computational linguistics is designed to uncover. `" - **Complex language models require a lot of compute power** - [[GPT-3]] uses parameters to tune its responses, giving some responses or words higher or lower weighting. [[OpenAI]] found that this was quite resource-intensive, because the process required more and more parameters to be added. [*]([[Article/What Is GPT-3? Everything Your Business Needs to Know About OpenAI’s Breakthrough AI Language Program | ZDNet]]) - "`What optimizes a neural net during training is the adjustment of its weights. The weights, which are also referred to as parameters, are matrices, arrays of rows and columns by which each vector is multiplied. Through multiplication, the many vectors of words, or word fragments, are given greater or lesser weighting in the final output as the neural network is tuned to close the error gap. OpenAI found that to do well on their increasingly large datasets, they had to add more and more weights. `" - "`The company described the total compute cycles required, stating that it is the equivalent of running one thousand trillion floating-point operations per second per day for 3,640 days. Computer maker and cloud operator Lambda Computing has estimated that it would take a single GPU 355 years to run that much compute, which, at a standard cloud GPU instance price, would cost $4.6 million. And then there's the memory. To hold all the weight values requires more and more memory as parameters grow in number. GPT-3's 175 billion parameters require 700GB, 10 times more than the memory on a single GPU. `" - "`OpenAI has produced its own research on the soaring computer power needed. The firm noted back in 2018 that computing cycles consumed by the largest AI training models have been doubling every 3.4 months since 2012, a faster rate of expansion than was the case for the famous Moore's Law of chip transistor growth. (Mind you, the company also has produced research showing that on a unit basis, the ever-larger models end up being more efficient than prior neural nets that did the same work.) Already, models are under development that use more than a trillion parameters, `" - **Increased difficulty with longer texts** - Machines struggle to build compelling arguments in longer-form content while avoiding repetition. - "`GPT-3 samples still sometimes repeat themselves semantically at the document level, start to lose coherence over sufficiently long passages," `" - **Dependence on human curation** - The quality of a machine's output is dependent on the quality of the inputs it is given. While there have been [attempts](((HQ9J1Ryqo))) to reduce the amount of human intervention a machine needs to be productive, there is still significant effort required to collect or curate samples. - "`getting good output from GPT-3 to some extent requires an investment in creating effective prompts. Some human-devised prompts will coax the program to better results than some other prompts. It's a new version of the adage "garbage in, garbage out." Prompts look like they may become a new domain of programming unto themselves, requiring both savvy and artfulness. `" - [[Insufficient but convenient samples can lead to biased output.]] - "Reliance on samples can lead to a bias in output. For example, [[English is the most used language on the internet]], so news articles from the views of English-speaking countries feature disproportionately more prominently in the probability distribution." - "`Bias is a big consideration, not only with GPT-3 but with all programs that are relying on conditional distribution. The underlying approach of the program is to give back exactly what's put into it, like a mirror. That has the potential for replicating biases in the data. There has already been a scholarly discussion of extensive bias in GPT-2. `" - "`OpenAI told ZDNet it is using a familiar kind of white hat, black hat wargaming to detect dangers in the program: We've deployed what we call a 'red team' that is tasked with constantly breaking the content filtration system so we can learn more about how and why the model returns bad outputs. Its counterpart is the "blue team" that is tasked with measuring and reducing bias. `" - **Output tends to be logical, not creative** - "Focusing on most probable incidences stifles creativity and innovation." This is particularly significant because higher forms of production include a new spin on old ideas, not regurgitating them. - "`Another big issue is the very broad, lowest-common-denominator nature of GPT-3, the fact that it reinforces only the fattest part of a curve of conditional probability. There is what's known as the long tail, and sometimes a fat tail, of a probability distribution. These are less common instances that may constitute the most innovative examples of language use. Focusing on mirroring the most prevalent text in a society risks driving out creativity and exploration. For the moment, OpenAI's answer to that problem is a setting one can adjust in GPT-3 called a temperature value. Fiddling with this knob will tune GPT-3 to pick less-likely word combinations and so produce text that is perhaps more unusual. `" - **Long ramp-up for training** - It takes time to train software with relevant data, not to mention effort. - [[The quality of the conclusion relies on the quality of the data.]]