A large language model (LLM) is a deep learning algorithm with the ability to recognize, summarize, translate, predict, and generate text and other content based on knowledge gained from massive training datasets. Language models utilize statistical methods to predict the next natural language token in a sequence, effectively determining what the next word should be based on the preceding words. LLMs are neural network-based language models trained on huge datasets with hundreds of millions to over a trillion parameters. The size of LLMs and their training data improve model quality but introduce infrastructure challenges, requiring vast computational resources. The largest and most powerful LLMs are based on transformer architecture due to its computational efficiency when processing sequences in parallel. Their use extends beyond natural language processing applications (translation, AI assistants, chatbots, etc.), with use cases in healthcare, software development, and many other fields.
The dataset (size and content) is central to the performance of an LLM. Modern LLMs are typically trained on vast text datasets sourced from the internet over a long period of time. These datasets are input into the AI algorithm using unsupervised learning, the model analyzes the data without explicit instructions on what to do with it. This process allows the LLM to learn words, how they connect to other words (the relationship between words), and ultimately the concepts behind them. This includes understanding the various meaning of homographs (words that are spelled the same but have different meanings) from context. From this, LLMs apply what they find in the training data to predict and generate content. LLMs can be customized for specific use cases using additional techniques, such as fine-tuning or prompt tuning, effectively feeding the models smaller datasets to focus on.
The size of a model is often described in terms of the number of parameters it has. Parameters refer to the internal variables that drive the model's output. The more parameters an LLM uses, the greater complicity and sophistication it can achieve. In recent years, the scale of LLMs has grown dramatically. The chart below shows the growth in model size from 2017 to 2021 on a logarithmic scale.
The process of scaling and maintaining LLMs introduces a number of technical and financial challenges. Building a foundational LLM typically requires months of training time and millions of dollars. LLM developers also need access to large enough datasets. Deploying LLMs requires significant technical expertise, including a strong understanding of deep learning, transformer models, and distributed software/hardware.