Product attributes
Other attributes
Mistral 7B is a 7.3 billion parameter large language model (LLM), the first foundational model developed by the French AI company Mistral AI. Mistral-7B-v0.1 was released on September 27, 2023, under the Apache 2.0 license and can be used without restrictions. The model was followed by a technical paper submitted on October 10, 2023. The company describes Mistral 7B as a "small, yet powerful model adaptable to many use-cases;" these include text summarisation, classification, text completion, and code completion. The model has language and coding capabilities, an 8k context length, and can be customized.
Mistral AI has stated that Mistral 7B:
- Outperforms Llama 2 13B on all benchmarks
- Outperforms Llama 1 34B on many benchmarks
- Approaches CodeLlama 7B performance on code
Mistral 7B leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. SWA exploits the stacked layers of a transformer to attend in the past beyond the window size, meaning higher layers have access to information further in the past than what the attention patterns seem to entail. A fixed attention span means the model can limit the cache to the size of sliding window tokens, using rotating buffers. This saves half of the cache memory for inference on sequence length of 8192, without impacting model quality.
To demonstrate the fine-tuning of Mistral 7B, the company also released Mistral 7B instruct trained on instruction datasets publicly available on HuggingFace. The outcome is a model that Mistral AI states outperforms all 7B models on MT-Bench, giving comparable performance to 13B chat models.
Mistral 7B can be downloaded and run anywhere using the accompanying reference implementation documentation, including locally. The model can be deployed on any cloud, using vLLM inference server and skypilot. Mistral 7B is also available on HuggingFace.
Mistral AI released a comparison of MIstral 7B to the Llama 2 family of models, re-running all model evaluations themselves. Benchmarks used for evaluation include the following:
- Commonsense reasoning—0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
- World knowledge—5-shot average of NaturalQuestions and TriviaQA.
- Reading comprehension—0-shot average of BoolQ and QuAC.
- Math—Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4
- Code—Average of 0-shot Humaneval and 3-shot MBPP
- Popular aggregated results—5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions only)