Large language model

Is a

Technology

Technology attributes

Related Industries

Natural language processing (NLP)

Implementations

GPT-3

GPT-4

Other attributes

Also Known As

LLM

Overview

A large language model (LLM) is a deep learning algorithm with the ability to recognize, summarize, translate, predict, and generate text and other content based on knowledge gained from massive training datasets. Language models utilize statistical methods to predict the next natural language token in a sequence, effectively determining what the next word should be based on the preceding words. LLMs are neural network-based language models trained on huge datasets with hundreds of millions to over a trillion parameters. The size of LLMs and their training data improve model quality but introduce infrastructure challenges, requiring vast computational resources. The largest and most powerful LLMs are based on transformer architecture due to its computational efficiency when processing sequences in parallel. Their use extends beyond natural language processing applications (translation, AI assistants, chatbots, etc.), with use cases in healthcare, software development, and many other fields.

The dataset (size and content) is central to the performance of an LLM. Modern LLMs are typically trained on vast text datasets sourced from the internet over a long period of time. These datasets are input into the AI algorithm using unsupervised learning, the model analyzes the data without explicit instructions on what to do with it. This process allows the LLM to learn words, how they connect to other words (the relationship between words), and ultimately the concepts behind them. This includes understanding the various meaning of homographs (words that are spelled the same but have different meanings) from context. From this, LLMs apply what they find in the training data to predict and generate content. LLMs can be customized for specific use cases using additional techniques, such as fine-tuning or prompt tuning, effectively feeding the models smaller datasets to focus on.

The size of a model is often described in terms of the number of parameters it has. Parameters refer to the internal variables that drive the model's output. The more parameters an LLM uses, the greater complicity and sophistication it can achieve. In recent years, the scale of LLMs has grown dramatically. The chart below shows the growth in model size from 2017 to 2021 on a logarithmic scale.

Growth in LLM model size, measured in # of parameters, from ELMo (94M) in 2017 to Megatron-Turing NLG (530B) in 2021.

The process of scaling and maintaining LLMs introduces a number of technical and financial challenges. Building a foundational LLM typically requires months of training time and millions of dollars. LLM developers also need access to large enough datasets. Deploying LLMs requires significant technical expertise, including a strong understanding of deep learning, transformer models, and distributed software/hardware.

Notable LLM releases

Name

Developer

Size (# of parameters)

Release

Bloom

Hugging Face, BigScience

176 B

July 6, 2022

ESMFold

Meta AI

15 B

Aug 22, 2022

Gato

Deepmind

1.18 B

May 12, 2022

GPT-3 (Generative Pre-trained Transformer-3)

OpenAI

175 B

May 28, 2020

LaMDA

Google

137 B

May 18, 2021

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

Large language model

Contents

Technology attributes

Other attributes

Notable LLM releases

Timeline

Further Resources

References

Find more entities like Large language model