Product attributes
Other attributes
Gemini is a family of multimodal AI models developed at Google that have been trained jointly across image, audio, video, and text data. Gemini models were designed with the goal of having general capabilities across modalities while also demonstrating strong performance in each domain. The models underwent large-scale pre-training as well as post-training to improve quality, enhance capabilities, and help alignment to meet model safety criteria.
A natively multimodal family of models, unlike other models that involve combining separate components for different modalities, Gemini was pre-trained from the start for different modalities. This results in enhanced capabilities, including sophisticated reasoning, advanced coding, and understanding text, images, audio, and more. Google describes Gemini's capabilities as having "the potential to transform any type of input into any type of output." Examples of Gemini use cases include generating both text and images, as well as combined outputs, coding, summarization, processing and analyzing raw audio, explaining reasoning in complex fields such as maths and physics, and understanding user intent to create bespoke experiences interacting with the models.
Google launched Gemini on December 6, 2023, with the first version (Gemini 1.0) including three model sizes:
- Gemini Ultra—the largest and most capable model for highly complex tasks
- Gemini Pro—the best model for scaling across a wide range of tasks
- Gemini Nano—the most efficient model for on-device tasks
Gemini Pro & Nano were made available from launch, with Google stating Gemini Ultra was in the process of "extensive trust and safety checks." At launch, Google announced Gemini would be powering Bard, its conversational chatbot, launched in February 2023. In February 2024, Google announced Bard was rebranding as "Gemini," becoming a part of the company's multimodal family of models. The chatbot, available in forty languages, is available on the web, through a dedicated Android app, and on the Google app on iOS. Also in February 2024, Google launched Gemini Ultra 1.0, its largest model and the first to outperform human experts on MMLU (massive multitask language understanding) benchmarks. The version of Google's chatbot powered by Ultra is called Gemini Advanced, available by subscribing to Google's One AI premium plan, which also offers expanded storage and exclusive product features. The Google One AI Premium plan is priced at $20 a month.
Gemini models can be divided into two post-trained family variants:
- Chat-focused variants referred to as Gemini App models, optimized for Gemini and Gemini Advanced, including the chatbot formerly known as Bard.
- Developer-focused variants, referred to as Gemini API models, optimized for a range of products and accessible through Google AI Studio and Cloud Vertex AI.
On February 15, 2024, Google announced the next generation of Gemini, called Gemini 1.5. The first model released in limited preview is Gemini 1.5 Pro, a mid-size multimodal model for a wide range of tasks with a standard context window. Upon launch, 1.5 Pro was available in thirty-eight languages across 180+ countries and territories. 1.5 Pro comes with a 128,000 context window with a limited group of developers able to test a 1 million token context window during a private preview available through AI Studio and Vertex AI. Google states Gemini 1.5 shows significant improvements across a range of dimensions, with 1.5 Pro demonstrating comparable performance to 1.0 Ultra while using less compute.
Gemini 1.5 incorporates a new Mixture-of-Experts (MoE) architecture that Google states makes it more efficient to train and serve. MoE models are divided into smaller neural networks, which are selectively activated depending on the type of input. The company says Gemini 1.5's larger context windows enable the models to process 1 hour of video hours of audio, codebases with over 30,000 lines of code or over 700,000 words, with testing underway to increase the context window up to 10 million tokens. 1.5 Pro outperforms 1.0 Pro on 87% of benchmarks, demonstrating a similar level as 1.0 Ultra. This performance is maintained even as the context window increases.
Google announced Gemini on December 6, 2023. DeepMind developed the model collaborating with other teams across Google, including Google Research. After its launch, Google began rolling out a number of products powered by Gemini. This included the company's chatbot Bard, which ran on a version of Gemini Pro fine-tuned for advanced reasoning, planning, understanding, and more. The new Gemini-powered Bard was initially available in English in more than 170 countries and territories, with Google planning to expand to different modalities and support new languages. The Google Pixel 8 Pro was engineered for running Gemini Nano locally to power features such as Summarize in the Recorder app and Smart Reply in Gboard.
Upon the announcement of Gemini, Google stated the model would be made available in services like Search, Ads, Chrome, and Duet AI. From December 13, 2023, developers and enterprise customers were given access to Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI. At launch, Google also stated Gemini Ultra is still being tested for safety and will roll out to developers and enterprise customers in early 2024.
On February 8, 2024, Google announced that Bard was rebranding to Gemini, becoming part of the company's multimodal family of models, and Gemini Ultra 1.0 was being released. The Gemini chatbot (formerly Bard) became available in 40 languages via a web app, the Gemini app on Android, and the Google app on iOS. The most advanced version of the chatbot running on Ultra is called Gemini Advanced and can be accessed via a Google One AI premium plan. Google also announced Gemini models were coming to a range of other products including Workspace and Google Cloud. Additionally, Android app users can set Gemini as their default assistant, replacing Google Assistant, and switch from Search to Gemini on the Google App on iOS.
On February 15, 2024, Google announced cloud customers can access Gemini 1.0 Ultra using the Gemini API in AI Studio and Vertex AI, and the next generation of its multimodal AI models Gemini 1.5. The company states that Gemini 1.5 demonstrates improvements across a number of dimensions compared to 1.0 Ultra while using less compute thanks to a new MoE architecture. The first model released in limited preview for testing is Gemini 1.5 Pro, a mid-size multimodal model that performs at a similar level to 1.0 Ultra. 1.5 Pro has a standard 128,000 token context window with plans to introduce pricing tiers that scale up to a 1 million token window. Developers and enterprise customers access 1.5 Pro via AI Studio and Vertex AI. Early testers were also given access to try the 1 million context window at no cost during the testing period.