Mistral 7B is a 7.3 billion parameter large language model, the first foundational model developed by the French AI company Mistral AI.
Mistral 7B is a 7.3 billion parameter large language modellarge language model (LLM), the first foundational model developed by the French AI company Mistral AIMistral AI. Mistral-7B-v0.1 was released on September 27, 2023, under the Apache 2.0 license and can be used without restrictions. The model was followed by a technical paper submitted on October 10, 2023. The company describes Mistral 7B as a "small, yet powerful model adaptable to many use-cases,;" these include text summarisation, classification, text completion, and code completion. The model has language and coding capabilities, an 8k context length, and can be customized.
Mistral AI released a comparison of MIstral 7B to the Llama 2 family of models, re-running all model evaluations themselves. Benchmarks used for evaluation include the following:
Mistral 7B is a 7.3B7.3 billion parameter large language model, the first foundational model developed by the French AI company Mistral AI.
Mistral 7B is a 7.3 billion parameter large language model (LLM), the first foundational model developed by the French AI company Mistral AI. Mistral-7B-v0.1 was released on September 27, 2023, under the Apache 2.0 license and can be used without restrictions. The model was followed by a technical paper submitted on October 10, 2023. The company describes Mistral 7B as a "small, yet powerful model adaptable to many use-cases," these include text summarisation, classification, text completion, and code completion. The model has language and coding capabilities, an 8k context length, and can be customized.
Mistral AI has stated that Mistral 7B:
Mistral 7B leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. SWA exploits the stacked layers of a transformer to attend in the past beyond the window size, meaning higher layers have access to information further in the past than what the attention patterns seem to entail. A fixed attention span means the model can limit the cache to the size of sliding window tokens, using rotating buffers. This saves half of the cache memory for inference on sequence length of 8192, without impacting model quality.
To demonstrate the fine-tuning of Mistral 7B, the company also released Mistral 7B instruct trained on instruction datasets publicly available on HuggingFace. The outcome is a model that Mistral AI states outperforms all 7B models on MT-Bench, giving comparable performance to 13B chat models.
Mistral 7B can be downloaded and run anywhere using the accompanying reference implementation documentation, including locally. The model can be deployed on any cloud, using vLLM inference server and skypilot. Mistral 7B is also available on HuggingFace.
Mistral AI released a comparison of MIstral 7B to the Llama 2 family of models, re-running all model evaluations themselves. Benchmarks used for evaluation include:
Results comparing Mistral 7B performance to three Llama models.
October 10, 2023
September 27, 2023
Mistral 7B is a 7.3B parameter large language model.
Mistral 7B is a 7.3 billion parameter large language model, the first foundational model developed by the French AI company Mistral AI.