Golden has been acquired by ComplyAdvantage.Read about it here ⟶

Large language model

Overview Structured Data Issues Contributors Activity

All edits

Edits on 31 Jul, 2023

Jude Gomila

edited on 31 Jul, 2023

Edits made to:

Infobox (+1/-1 properties)

Infobox

Related Technology

Also Known As

LLM

Edits on 13 Jun, 2023

Arthur Smalley

edited on 13 Jun, 2023

Edits made to:

Timeline (+8 events) (+1152 characters)

Timeline

May 10, 2023

Google releases PaLM 2 with improved multilingual, reasoning, and coding capabilities.

April 28, 2023

Stable Diffusion releases StableVicuna-13B an auto-regressive language model based on the LLaMA transformer architecture.

March 30, 2023

Bloomberg release a 50 billion parameter LLM for finance called BloombergGPT.

BloombergGPT is trained on financial data including Bloomberg's extensive database.

March 13, 2023

OpenAI releases GPT-4 its next generation LLM capable multi-modal generation.

December 26, 2022

Google releases Med-PaLM, an application of LLMs trained on clinical knowledge.

December 15, 2022

Anthropic releases a paper describing "constitutional AI," a process aiming to train harmless AI assistants through self-improvement.

The only human oversight in constitutional AI is a list of rules or principles that allow it to engage with harmful queries by explaining its objections to them.

November 30, 2022

OpenAI releases ChatGPT, an LLM trained to follow instructions and provide a detailed response.

ChatGPT would go on to become the fastest-growing consumer application in history with 100 million active users two months after launch.

May 2, 2022

Meta AI releases a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters.

Arthur Smalley

edited on 13 Jun, 2023

Edits made to:

Timeline (+8 events) (+1393/-9 characters)

Timeline

July 26, 2022

The BigScience research workshop releases BLOOM ( BigScience Large Open-science Open-access Multilingual Language Model) a 176B parameter autoregressive language model based on GPT-3 architecture.

April 6, 2022

SalesForce Research unveils CodeGen a large language model for conversational AI programming.

April 4, 2022

Google introduces PaLM (Pathways Language Model), a 540-billion parameter decoder-only transformer model.

The model is part of Google Research Pathways' vision, the development of a single model that can generalize across domains and tasks.

March 29, 2022

DeepMind releases a paper investigating the optimal model size and number of tokens to train a transformer language model for a given compute budget.

The paper describes a comput-optimal model called Chincilla, capable of outperforming larger models on a range of tasks.

February 28, 2022

Cohere launches "Extremely Large" beta, a model that outperforms the company's previous model in a number of tasks including sentiment analysis, named entity recognition (NER), and common sense reasoning.

February 2, 2022

EleutherAI announce GPT-NeoX-20B, a 20 billion parameter model trained using their GPT-NeoX framework.

January 27, 2022

OpenAIrelease a series of "InstructGPT" models that better follow user intentions.

The models, which were trained with humans in the loop, would go on to be referred to as GPT-3.5.

December 1, 2021

Anthropic researchers release a general language assistant to study baseline techniques for alignment.

1948

Claude Shannon publishedpublishes a paper titled "A Mathematical Theory of Communication," proposing a method to create a statistical model of letter sequences in English text using a Markov chain.

Edits on 9 Jun, 2023

Arthur Smalley

edited on 9 Jun, 2023

Edits made to:

Infobox (+2 properties)

Timeline (+22 events) (+2916 characters)

Infobox

Implementations

GPT-3

GPT-4

Timeline

June 25, 2021

The Alibaba Dharma Academy release M6 1T a MultiModality-to-MultiModality Multitask Mega-transformer model.

June 12, 2021

Chinese researchers announce Wudao 2.0 a pretraind LLM using inverse prompting.

June 4, 2021

EleutherAI release GPT-J, an open-source 6 billion parameter language model trained on the Pile.

May 18, 2021

Google introduces LaMDA (Language Model for Dialogue Applications) an LLM trained on dialogue data.

March 9, 2021

Chinese researchers release WuDaoCorpora, a Chinese language dataset for pre-training language models.

December 31, 2020

EleutherAI release The Pile a free and publicly available 825GB dataset of diverse English text for language modeling.

September 8, 2020

The Guardian publishes an article written entirely by GPT-3.

April 29, 2020

Facebook open-sources BlenderBot a Chatbot with 9.4 billion parameters.

January 28, 2020

Google introduces Meena, a 2.6 billion parameter open-domain chatbot.

October 25, 2019

Google announces it has begun using BERT to improve search.

February 14, 2019

OpenAI introduces GPT-2 its next LLM with 1.5 billion parameters and trained on 8 million web pages.

October 18, 2018

Google introduces and open-sources a new language representation model called BERT (Bidirectional Encoder Representations from Transformers).

June 11, 2018

OpenAI publishes results of language understanding with unsupervised learning.

The model would go on to become GPT-1.

2013

Tomas Mikolov introduces the idea of word embeddings, a method of representing a word mathematically, in a paper titled "Efficient Estimation of Word Representations in Vector Space."

2010

The Stanford NLP group introduce the CoreNLP suite, a set of tools and algorithms to help researchers tackle complex NLP tasks such as sentiment analysis and named entity recognition.

1997

Hochreiter & Schmidhuber introduce the Long Short-Term Memory (LSTM) networks.

These new networks allow for the creation of deeper and more complex neural networks capable of handling more significant amounts of data.

February 1975

A paper published by IBM researcher J Baker, describes the DRAGON speech understanding system, a model using the Markov method for speech recognition.

December 1966

Baum & Petrie publish a paper introducing the Hidden Markov Model.

January 1966

ELIZA, a language model developed by Joseph Weizenbaum at MIT is published in Computational Linguistics.

ELIZA uses a simple set of rules to mimic human conversation, responding to user inputs.

January 1954

A collaboration between IBM and Georgetown University publicly demonstrates a Russian-English machine translation system.

The demonstration took place in New York translating 250 words and six grammar rules.

1948

Claude Shannon published a paper titled "A Mathematical Theory of Communication," proposing a method to create a statistical model of letter sequences in English text using a Markov chain.

January 23, 1913

Andrey A. Markov delivers a lecture to the Imperial Academy of Sciences in St. Petersburg on a computational technique now called the Markov chain.

Markov used the first 20,000 letters of Alexander Pushkin’s 1833 verse novel “Eugene Onegin,” to predict the frequency of vowels and consonants in a work of literature.

Edits on 11 Mar, 2023

Amy Tomlinson Gayle

edited on 11 Mar, 2023

Edits made to:

Article (+4 characters)

Article

The dataset (size and content) is central to the performance of an LLM. Modern LLMs are typically trained on vast text datasets sourced from the internet over a long period of time. These datasets are input into the AI algorithm using unsupervised learning, the model analyzes the data without explicit instructions on what to do with it. This process allows the LLM to learn words, how they connect to other words (the relationship between words), and ultimately the concepts behind them. This includes understanding the various meaning of homographs (words that are spelled the same but have different meanings) from context. From this, LLMs apply what they find in the training data to predict and generate content. LLMs can be customized for specific use cases using additional techniques, such as fine-tuning or prompt tuning, effectively feeding the models smaller datasets to focus on.

...

The size of a model is often described in terms of the number of parameters it has. Parameters refer to the internal variables that drive the model's output. The more parameters an LLM uses, the greater complicity and sophistication it can achieve. In recent years, the scale of LLMs has grown dramatically. The chart below shows the growth in model size from 2017 to 2021 on a logarithmic scale.

Arthur Smalley

edited on 11 Mar, 2023

Edits made to:

Article (+5 rows) (+20 cells) (+219 characters)

Article

0Notable LLM releases

Name

Developer

Size (# of parameters)

Release

Bloom

Hugging Face, BigScience

176 B

July 6, 2022

ESMFold

Meta AI

15 B

Aug 22, 2022

Gato

Deepmind

1.18 B

May 12, 2022

MT-NLG

Nvidia, Microsoft

530 B

October 13, 2021

WuDao 2.0

Beijing Academy of Artificial Intelligence

1.75 T

May 31, 2021

Edits on 8 Mar, 2023

Arthur Smalley

edited on 8 Mar, 2023

Edits made to:

Infobox (+1 properties)

Timeline (+2 events) (+175 characters)

Description (+109/-27 characters)

Article (+2 rows) (+8 cells) (+1 images) (+2841 characters)

Categories (+1 topics)

Large language model

A large language model (LLM) is a deep learning algorithm thatwith generatesthe ability to recognize, summarize, translate, predict, and generate text and other content based on knowledge gained from largemassive datatraining setsdatasets.

Article

Overview

A large language model (LLM) is a deep learning algorithm with the ability to recognize, summarize, translate, predict, and generate text and other content based on knowledge gained from massive training datasets. Language models utilize statistical methods to predict the next natural language token in a sequence, effectively determining what the next word should be based on the preceding words. LLMs are neural network-based language models trained on huge datasets with hundreds of millions to over a trillion parameters. The size of LLMs and their training data improve model quality but introduce infrastructure challenges, requiring vast computational resources. The largest and most powerful LLMs are based on transformer architecture due to its computational efficiency when processing sequences in parallel. Their use extends beyond natural language processing applications (translation, AI assistants, chatbots, etc.), with use cases in healthcare, software development, and many other fields.

The dataset (size and content) is central to the performance of an LLM. Modern LLMs are typically trained on vast text datasets sourced from the internet over a long period of time. These datasets are input into the AI algorithm using unsupervised learning, the model analyzes the data without explicit instructions on what to do with it. This process allows the LLM to learn words, how they connect to other words (the relationship between words), and ultimately the concepts behind them. This includes understanding the various meaning of homographs (words that are spelled the same but have different meanings) from context. From this LLMs apply what they find in the training data to predict and generate content. LLMs can be customized for specific use cases using additional techniques such as fine-tuning or prompt tuning, effectively feeding the models smaller datasets to focus on.

The size of a model is often described in terms of the number of parameters it has. Parameters refer to the internal variables that drive the model's output. The more parameters an LLM uses the greater complicity and sophistication it can achieve. In recent years the scale of LLMs has grown dramatically. The chart below shows the growth in model size from 2017 to 2021 on a logarithmic scale.

Growth in LLM model size, measured in # of parameters, from ELMo (94M) in 2017 to Megatron-Turing NLG (530B) in 2021.

The process of scaling and maintaining LLMs introduces a number of technical and financial challenges. Building a foundational LLM typically requires months of training time and millions of dollars. LLM developers also need access to large enough datasets. Deploying LLMs requires significant technical expertise, including a strong understanding of deep learning, transformer models, and distributed software/hardware.