Company attributes
Technology attributes
Other attributes
LangChain is an open-source framework for developing applications powered by large language models (LLMs). The framework combines LLMs with other sources of computation and knowledge to help develop new applications that are data-aware (connecting a language model with other sources of information) and agentic (allowing a language model to interact dynamically with its environment). LangChain is available as both a Python and a TypeScript package.
LangChain aims to streamline the development of a wide range of applications, including chatbots, generative question-answering, and summarization. It does this by "chaining" components from multiple modules to build more advanced LLM use cases.
- Components—LangChain provides modular abstractions for the components needed to work with language models. LangChain also has collections of implementations for these abstractions.
- Off-the-shelf chains—Chains can be thought of as assembling components in particular ways to best accomplish a particular use case. They are intended to be a higher-level interface through which users can get started with a specific use case. These chains are also designed to be customizable.
LangChain supports a number of language models, including those from prominent AI platforms, such as OpenAI, Hugging Face, and Anthropic. The framework provides APIs to access and interact with LLMs and an array of tools, components, and interfaces to help the development process. LangChain also has extensive documentation to help users become familiar with the framework.
LangChain was created by Harrison Chase, with the first version released on October 24, 2022. In a tweet thread upon its release, Chase described LangChain as
a python package aimed at helping build LLM applications through composability... The real power comes when you are able to combine [LLMs] with other things... LangChain aims to help with that by creating… a comprehensive collection of pieces you would ever want to combine… a flexible interface for combining pieces into a single comprehensive ‘chain’
LangChain was initially developed by Chase while he was working at Robust Intelligence, an MLOps company testing and validating machine learning models. Chase led the ML team at Robust Intelligence. Prior to that, he led the entity linking team at Kensho (a fintech start-up) and attended Harvard University, studying statistics and computer science.
LangChain began as an open-source side project with no intention of starting a company. Chase saw common patterns in how people were approaching problems utilizing language models. LangChain is an attempt to create abstractions that alleviate these issues. With the project growing significantly, Chase cofounded a company to develop the framework with Ankush Gola in January 2023. Gola is a software engineer who previously worked at Unfold, Robust Intelligence, and Facebook.
On February 17, 2023, LangChain released support for TypeScript, allowing users to recreate applications in TypeScript natively. The TypeScript package mirrors the Python package as closely as possible, utilizing the same serializable format such that artifacts can be shared between languages. LangChain initially chose Python, as it is popular among machine learning research-oriented communities. However, as interest in the project grew, it was being used by people across the stack, many of which prefer Javascript.
The project continued to grow with over 20k stars on GitHub, 10K active Discord members, over 30K followers on Twitter, and over 350 contributors, by early April 2023. On April 4, 2023, LangChain announced it had raised $10 million in seed funding. The round was led by Benchmark, who will also provide LangChain with counsel. Benchmark has previously been the first lead investor in major open-source projects such as Docker, Confluent, Elastic, and Clickhouse.
On April 11, 2023, LangChain announced support for running LangChain.js in browsers, Cloudflare Workers, Vercel/Next.js, Deno, and Supabase Edge Functions, alongside existing support for Node.js ESM and CJS. Originally, LangChain.js was designed to run in Node.js, the team began collecting feedback from the LangChain community to determine what other JS runtimes the framework should support.
Shortly after its seed round on April 13, 2023, BusinessInsider reported that LangChain had raised between $20 million and $25 million in funding from Sequoia, giving the company a valuation of at least $200 million. The deal was headed up by Sonya Huang, a growth investor known for her work in generative AI. Sequoia avoided a formal fundraising process by pre-empting the round.
Basic data types and schema used throughout the LangChain codebase include the following:
- Text—the primary interface when working with language models; much of the interfaces in LangChain are centered around text
- Chat messages—more specifically, many models primarily interact through a chat interface with some providers' APIs expecting chat messages. LangChain supported chat message users, including system (instructions to the AI system), human (information provided by the human interacting with the AI system), and AI (information returned by the AI system)
- Examples—input/output pairs that can be used both for training or evaluating a model. These pairs can be for a single model or for a chain
- Document—a piece of unstructured data consisting of its content and metadata
LangChain offers standard, extendable interfaces and external integrations for the following modules:
- Model I/O—interfacing with language models
- Data connection—interfacing with application-specific data
- Chains—sequences of calls
- Agents—let chains choose the tools to use given high-level directives
- Memory—persist application state between runs of a chain
- Callbacks—log and stream intermediate steps of any chain
LangChain enables access to a range of pre-trained models that generate outputs based on the prompt and input provided. These models can be further fine-tuned to match the specific needs of the user. The following are the three types of models LangChain works with:
- LLMs—return a text string based on a text string input
- Chat models—usually backed by a language model, chat models take a list of chat messages as input and return a chat message
- Text embedding models—return a list of floats from text input
The prompt or input is rarely hard-coded, instead being constructed from multiple components. A prompt template is responsible for the construction of the input, with LangChain providing several classes and functions to simplify working with prompts. The template defines the structure of the prompt, including the format and content. The prompt can be divided into four sections:
- Prompt value—a class that exposes methods to be converted to the exact input types that each model type expects (e.g., text or chat messages)
- Prompt template—the object responsible for creating prompt value, the prompt template exposes a method for taking in input variables and returning a prompt value
- Example selectors—used to include examples in the prompt; these can be hardcoded or dynamically selected
- Output parser—responsible for instructing the model on how the output should be formatted and parsing the output into the desired format (including retrying if necessary)
LangChain offers functionality to load, transform, store, and query user's data via the following:
- Document loaders—load documents from many different sources
- Document transformers—split documents, drop redundant documents, and more
- Text embedding models—take unstructured text and turn it into a list of floating point numbers
- Vector stores—store and search over embedded data
- Retrievers—query your data
Indexes structure documents such that LLMs can interact best with them. LangChain provides functions for working with documents, different types of indexes, and then examples for using those indexes in chains. The most common method of using indexes in chains is in a retrieval step, taking a user's query and returning the most relevant documents. Typically, this refers to retrieving unstructured documents, such as text documents. The primary index and retrieval types supported by LangChain are centered around vector databases.
A generic concept that returns a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case. The most commonly used chain is an LLM chain that combines a prompt template, a model, and guardrails to take user input, format it accordingly, pass it to the model, then get a response, before validating and fixing the model output if necessary.
Some generative AI applications require more than a predetermined chain of calls to LLMs and other tools. They may also need an unknown chain based on the user's inputs. These applications require an "agent" that has access to a suite of tools and can decide which to call. The agent acts as a "wrapper" around a model that takes user inputs and returns a response corresponding to an “action” to take and a corresponding “action input.” LangChain provides a dedicated module for building agents, which can be used to create chatbots or personal assistants.
Memory is the concept of storing and retrieving data during a conversation. By default, chains and agents are stateless; they react to each new input query independently of prior inputs. Memory allows them to recall previous interactions with users, providing them with more personalized and contextualized responses. The two main methods are based on input (fetch any relevant pieces of data) or input and output (update the state accordingly). The two main types of memory are short term and long term. Short term typically refers to how data is passed in the context of a singular conversation. Long term deals with fetching and updating information between conversations.
LangChain provides a callbacks system, allowing users to hook into the various stages of their LLM application. This can be useful for logging, monitoring, streaming, and other tasks. Users access these events through the callback argument available throughout the API.
LangChain can help with a number of end-to-end use cases, including those below:
- Personal assistants
- Question answering over docs
- Chatbots
- Querying tabular data
Personal assistants can be built upon the following LangChain components:
- Prompt template guiding how the personal assistant acts.
- Memory in order to hold a conversation (short term) and improve its interactions over time (long term)
- Tools that allow it to interface with other sources of computation and knowledge.
- Agent to understand the actions it should take
- Agent executor to create an environment for the agent to use the tools.
LangChains allows users to provide documents for LLMs to ingest new documents for question answering. Users can create an index over the data they contain to save time retrieving information.
LangChain users can construct a chatbot from an LLM or chat model, a prompt template that guides how the chatbot acts, and memory to ensure the model changes its outputs based on previous interactions. LangChain-produced chatbots can be differentiated by combining them with other sources of data using similar techniques as the question answering over documents use case.
LangChain offers various resources for querying information stored in tabular data, including csvs, excel sheets, or SQL tables. This includes document loading and indexing and querying using language models to interact with the data as well as chains and agents to perform more advanced tasks.
LangChain integrates with a number of LLMs, systems, and products to help developers build applications using the environment they choose. These integrations can be grouped by the core LangChain modules mapping to:
- LLM providers
- Chat model providers
- Text embedding model providers
- Document loader integrations
- Text splitter integrations
- Vectorstore providers
- Retriever providers
- Tool providers
- Toolkit integrations
A comprehensive list of LangChain integrations can be found in their documentation. A short list of key integrations includes the following:
- LLMs—OpenAI, Hugging Face, and Anthropic
- Cloud Platforms—Azure, Amazon, Google Cloud, and other popular cloud providers
- Data Sources—Google Drive, Notion, Wikipedia, and Apify Actors
With the widespread availability of new generative AI models, LangChain has become a popular framework to chain AI functionality and build new applications. LangChain is being used across a large number of projects, including the following:
- An admissions explorer for schools and universities
- Law Pilot, an AI chatbot for legal documents
- SheetSense, a chatbot for user-input data
- An AI-powered customer service helpline app
- Flow, an app creating personalized pages on the internet.
- DuetGPT, a semi-autonomous developer assistant