Mistral 7B

mistral.ai/product/

Is a

Product

Product attributes

Launch Date

September 27, 2023

Industry

Generative AI

Artificial Intelligence (AI)

Product Parent Company

Competitors

Technologies Used

Other attributes

Announcement URL

mistral.ai/news/anno...istral-7b/

Overview

Mistral 7B is a 7.3 billion parameter large language model (LLM), the first foundational model developed by the French AI company Mistral AI. Mistral-7B-v0.1 was released on September 27, 2023, under the Apache 2.0 license and can be used without restrictions. The model was followed by a technical paper submitted on October 10, 2023. The company describes Mistral 7B as a "small, yet powerful model adaptable to many use-cases;" these include text summarisation, classification, text completion, and code completion. The model has language and coding capabilities, an 8k context length, and can be customized.

Mistral AI has stated that Mistral 7B:

Outperforms Llama 2 13B on all benchmarks
Outperforms Llama 1 34B on many benchmarks
Approaches CodeLlama 7B performance on code

Mistral 7B leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. SWA exploits the stacked layers of a transformer to attend in the past beyond the window size, meaning higher layers have access to information further in the past than what the attention patterns seem to entail. A fixed attention span means the model can limit the cache to the size of sliding window tokens, using rotating buffers. This saves half of the cache memory for inference on sequence length of 8192, without impacting model quality.

To demonstrate the fine-tuning of Mistral 7B, the company also released Mistral 7B instruct trained on instruction datasets publicly available on HuggingFace. The outcome is a model that Mistral AI states outperforms all 7B models on MT-Bench, giving comparable performance to 13B chat models.

Mistral 7B can be downloaded and run anywhere using the accompanying reference implementation documentation, including locally. The model can be deployed on any cloud, using vLLM inference server and skypilot. Mistral 7B is also available on HuggingFace.

Performance

Mistral AI released a comparison of MIstral 7B to the Llama 2 family of models, re-running all model evaluations themselves. Benchmarks used for evaluation include the following:

Commonsense reasoning—0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
World knowledge—5-shot average of NaturalQuestions and TriviaQA.
Reading comprehension—0-shot average of BoolQ and QuAC.
Math—Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4
Code—Average of 0-shot Humaneval and 3-shot MBPP
Popular aggregated results—5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions only)

Results comparing Mistral 7B performance to three Llama models.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

Mistral 7B

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

https://arxiv.org/abs/2310.06825

October 10, 2023

Mistral 7B

Contents

Product attributes

Other attributes

Timeline

Further Resources

References

Find more entities like Mistral 7B