- Date Created: [[2020-10-02]] - [[Linguistics]] [[Artificial Intelligence]] [[Natural Language Processing]] - ### Compression and decompression of semantic units - Distill words from samples into numerical representations, called vectors. - This is [[GPT-3]]'s approach. Words are compressed and decompressed to test comprehension. - Once broken down into [atomic]([[Principle of Atomicity]]) blocks, it's easier to build or produce new content. - ### Assign weights according to [[Probability distribution]] - Compute the [[Conditional probability]] of words based on their appearance in samples to help predict or produce more words as part of a sequence or series. - [[GPT-3]] does this as well. - "`GPT-3 is learning in the sense that its parameter weights are being tuned automatically via ingestion of the training data so that the language model ends up better than its explicit programming alone would afford. `" - Pros: Given large sample size, this can be extremely accurate. - Cons - Requires A LOT of data, storage, and processing power. - Reliance on samples can lead to a bias in output. For example, [[English is the most used language on the internet]], so news articles from the views of English-speaking countries feature disproportionately more prominently in the probability distribution. - Focusing on most probable incidences stifles creativity and innovation. - ### Provide high-quality, curated content as input - Feed the program sentence pairs (for example, questions and responses) that are highly relevant for the purpose. - Pros: This would tend to give higher quality results faster, because the program only needs to examine this small subset of data rather than a vast linguistic corpora. - Cons: Putting together the curated samples requires significant human effort.