Company attributes
Other attributes
Mistral AI is a developer of generative AI models and open-source alternatives to large language model (LLM) platforms. It offers text-based model applications in art generation, content creation, chatbots, virtual assistants, language translation, and customer service. The company serves business clients to help them improve processes around research and development, customer care, and marketing through new tools developed with AI. The AI and LLM platform developed by Mistral AI is intended to address the public misuse challenges and security issues facing ChatGPT by, in part, rivaling OpenAI's ChatGPT. To do this, Mistral AI is using open source because the founders believe that open source overcomes misuse potentials.
Mistral AI's generative AI platform is available in early access, serving the company's open models. Mistral AI's models include the following:
- •Mistral 7B—A 7B dense transformer for a variety of use cases. Supports English and code, and a 8k context window.
- •Mixtral 8x7B—A sparse mixture-of-experts model with stronger capabilities compared to Mistral 7B. Uses 12B active parameters out of 45B total. Supports multiple languages, code, and a 32k context window.
Mistral AI provides two types of access to LLMs an API with pay-as-you-go access and under the Apache 2.0 license, available directly from its documentation and on Hugging Face. The Mistral AI API is in beta. Users can join the waiting list through the company's platform, immediately accessing the chat endpoint once their subscription is active.
Headquartered in Paris, Mistral was founded in May 2023 by Arthur Mensch (CEO), Timothée Lacroix (CTO), and Guillaume Lample (chief science officer)—alums of Google DeepMind and Meta. Mistral AI has stated that French investment bank Bpifrance and former Google CEO Eric Schmidt are shareholders in the company.
Four weeks after its founding, in June 2023, Mistral AI raised a $113 million seed round, leading some to speculate that there was an "AI bubble," especially as the funding was raised before Mistral AI had a product or customers. The round was led by Lightspeed Venture Partners with participation from Redpoint, Index Ventures, Xavier Niel, JCDecaux Holding, Rodolphe Saadé and Motier Ventures in France, La Famiglia and Headline in Germany, Exor Ventures in Italy, Sofina in Belgium, and First Minute Capital and LocalGlobe in the UK. Sources close to the company stated the funding values Mistral AI at $260 million.
On September 27, 2023, Mistral AI released its first model, Mistral 7B, a 7.3 billion parameter model that the company stated is the most powerful model released for its size. Mistral 7B is released under the Apache 2.0 license and can be used without restrictions. Alongside the release of Mistral 7B, the company released a blog post stating its commitment to open source AI development, including the following statements:
At Mistral AI, we believe that an open approach to generative AI is necessary. Community-backed model development is the surest path to fight censorship and bias in a technology shaping our future. We strongly believe that by training our own models, releasing them openly, and fostering community contributions, we can build a credible alternative to the emerging AI oligopoly. Open-weight generative models will play a pivotal role in the upcoming AI revolution.
On December 11, 2023, Mistral AI released its new model, Mixtral 8x7B, opened beta access to its first platform services, and announced $415 million in series A funding. Mixtral 8x7B is a sparse mixture of experts model (SMoE) with open weights and licensed under Apache 2.0. Mistral AI states the model outperforms Llama 2 70B on many benchmarks with 6x faster inference and matches or outperforms GPT3.5 on most standard benchmarks. "La Plateforme," the company's commercial platform, allows developers to access, deploy, and customize Mistral AI models for production. It serves three chat endpoints for generating text following textual instructions and an embedding endpoint, with each endpoint having a different performance/price tradeoff.
The $415 million series A round was led by Andreessen Horowitz (a16z) with Lightspeed Venture Partners also investing in the company again. Other investors also participating in the round, include Salesforce, BNP Paribas, CMA-CGM, General Catalyst, Elad Gil, and Conviction. Reports state the new funding values the company at roughly $2 billion. Other reports state the funding was closer to $450 million with $200 million from Andreessen Horowitz and $130 million from Nvidia and Salesforce in convertible debt.
Mistral 7B is a 7.3B parameter model that supports English and code, and an 8k context window. Mistral 7B uses a sliding window attention (SWA) mechanism, in which each layer attends to the previous 4,096 hidden states. The SWA mechanism exploits the stacked layers of a transformer to attend in the past beyond the window size, i.e., higher layers have access to information further in the past than what the attention patterns seem to entail. A fixed attention span means the model can limit the cache using rotating buffers. This saves half of the cache memory for inference on sequence length of 8192, without impacting model quality.
Mistral AI states that Mistral 7B:
- •outperforms Llama 2 13B on all benchmarks,
- •outperforms Llama 1 34B on many benchmarks, and
- •approaches CodeLlama 7B performance on code while remaining good at English tasks.
The model was released under the Apache 2.0 license and can be used without restrictions. Users can download and run the model anywhere, including locally. It can be deployed on any cloud using vLLM inference server and skypilot and used on Hugging Face. Mistral 7B can also be fine-tuned on any task.
Mixtral 8x7B is a sparse mixture of experts model (SMoE) with open weights and the following capabilities:
- •Handles a context of 32k tokens
- •Provides support for English, French, Italian, German and Spanish
- •Supports code generation
- •Fine-tunes into an instruction-following model that achieves a score of 8.3 on MT-Bench
The model uses 12 billion active parameters out of 45 billion total. Released under Apache 2.0, Mistral AI states Mixtral 8x7B is the strongest open-weight model with a permissive license, outperforming Llama 2 70B on most benchmarks and matching or outperforming GPT3.5 on most standard benchmarks.
Mistral AI model sizes
La Plateforme is Mistral AI's developer platform for deploying and customizing its models. Upon release, la plateforme serves three chat endpoints for generating text following textual instructions and an embedding endpoint. Each endpoint provides a different performance/price tradeoff. The first two generative endpoints—mistral-tiny and mistral-small—use Mistral 7B and Mixtral 8X7B, respectively. The third, mistral-medium, uses a prototype model with higher performances. Mistral-embed, the embedding endpoint, serves an embedding model with a 1024 embedding dimension.