Golden has been acquired by ComplyAdvantage.Read about it here ⟶

Mistral 7B

Mistral 7B is a 7.3 billion parameter large language model, the first foundational model developed by the French AI company Mistral AI.

Overview Structured Data Issues Contributors Activity

All edits

Edits on 22 Feb, 2024

"update inverses"

Golden AI

edited on 22 Feb, 2024

Edits made to:

Infobox (+1 properties)

Infobox

Competitors

Gemma (Google)

Edits on 11 Oct, 2023

Amy Tomlinson Gayle

edited on 11 Oct, 2023

Edits made to:

Article (+45/-31 characters)

Article

Mistral 7B is a 7.3 billion parameter large language modellarge language model (LLM), the first foundational model developed by the French AI company Mistral AIMistral AI. Mistral-7B-v0.1 was released on September 27, 2023, under the Apache 2.0 license and can be used without restrictions. The model was followed by a technical paper submitted on October 10, 2023. The company describes Mistral 7B as a "small, yet powerful model adaptable to many use-cases,;" these include text summarisation, classification, text completion, and code completion. The model has language and coding capabilities, an 8k context length, and can be customized.

...

Mistral AI released a comparison of MIstral 7B to the Llama 2 family of models, re-running all model evaluations themselves. Benchmarks used for evaluation include the following:

Arthur Smalley

edited on 11 Oct, 2023

Edits made to:

Infobox (+1 properties)

Timeline (+2 events) (+171 characters)

Description (+87/-5 characters)

Article (+1 images) (+2691 characters)

Further Resources (+1 rows) (+4 cells) (+377 characters)

Mistral 7B

Mistral 7B is a 7.3B7.3 billion parameter large language model, the first foundational model developed by the French AI company Mistral AI.

Article

Overview

Mistral 7B is a 7.3 billion parameter large language model (LLM), the first foundational model developed by the French AI company Mistral AI. Mistral-7B-v0.1 was released on September 27, 2023, under the Apache 2.0 license and can be used without restrictions. The model was followed by a technical paper submitted on October 10, 2023. The company describes Mistral 7B as a "small, yet powerful model adaptable to many use-cases," these include text summarisation, classification, text completion, and code completion. The model has language and coding capabilities, an 8k context length, and can be customized.

Mistral AI has stated that Mistral 7B:

Outperforms Llama 2 13B on all benchmarks
Outperforms Llama 1 34B on many benchmarks
Approaches CodeLlama 7B performance on code

Mistral 7B leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. SWA exploits the stacked layers of a transformer to attend in the past beyond the window size, meaning higher layers have access to information further in the past than what the attention patterns seem to entail. A fixed attention span means the model can limit the cache to the size of sliding window tokens, using rotating buffers. This saves half of the cache memory for inference on sequence length of 8192, without impacting model quality.

To demonstrate the fine-tuning of Mistral 7B, the company also released Mistral 7B instruct trained on instruction datasets publicly available on HuggingFace. The outcome is a model that Mistral AI states outperforms all 7B models on MT-Bench, giving comparable performance to 13B chat models.

Mistral 7B can be downloaded and run anywhere using the accompanying reference implementation documentation, including locally. The model can be deployed on any cloud, using vLLM inference server and skypilot. Mistral 7B is also available on HuggingFace.

Performance

Mistral AI released a comparison of MIstral 7B to the Llama 2 family of models, re-running all model evaluations themselves. Benchmarks used for evaluation include:

Commonsense reasoning—0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
World knowledge—5-shot average of NaturalQuestions and TriviaQA.
Reading comprehension—0-shot average of BoolQ and QuAC.
Math—Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4
Code—Average of 0-shot Humaneval and 3-shot MBPP
Popular aggregated results—5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions only)

Results comparing Mistral 7B performance to three Llama models.

Further Resources

Title

Author

Link

Type

Date

Mistral 7B

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

https://arxiv.org/abs/2310.06825

October 10, 2023

Infobox

Launch Date

September 27, 2023

Competitors