MLCommons is a collaborative open engineering consortium, focused on developing the AI ecosystem through benchmarks, public datasets, and research.
MLCommons is a collaborative open engineering consortium, focused on developing the AI ecosystem through benchmarks, public datasets, and research. MLCommons' mission is to accelerate machine learning (ML) innovation and increase its positive impact on society. While AI and ML have been around for decades, the technology is often fragmented, bespoke, and poorly understood. MLCommons aims to unlock the next stage of ML adoption by creating useful measures of quality and performance, large-scale open data sets, and common development practices and resources. MLCommons believes theirits efforts will help democratize ML and enable its widespread adoption into new products and services, growing ML from a research field into a mature industry.
Founded in 2020 and headquartered in San Francisco, MLCommon's history goes back to the MLPerf benchmark in 2018. The MLPerf benchmark quickly grew into a suite of industry metrics for measuring machine learning performance and promoting transparency of machine learning techniques. Starting with over 50fifty founding partners, MLCommons is a community-driven and community-funded effort. Its members include startupsstart-ups, leading companies, academics, and non-profitsnonprofits from around the world. MLCommons promotes open-source and open data development, with most of its software projects available under the Apache 2.0 license and its datasets using CC-BY 4.0.
Developing ML benchmarks provides consistent measurements of accuracy, speed, and efficiency. This enables engineers to design reliable products and helps researchers compare innovations, choosing the best ideas for the future of the field. MLCommons' work on benchmarks is divided into training and inference workgroups,work groups that continue to release and develop and release benchmarks for the industry.
MLCommons releases public datasets to help academics and entrepreneurs develop new technologies and start new companies. Datasets released by MLCommons include the following:
ADollar Street is a collection of images showing everyday household items from homes around the world, visually capturing the socioeconomic diversity of traditionally underrepresented populations. It includes 38,479 images collected from 63 different countries, tagged from a set of 289 possible topics. The metadata for each image includes demographic information such as region, country, and total household monthly income. Dollar Street consists of public domain data, licensed for academic, commercial, and non-commercial usage, under CC-BY and CC-BY-SA 4.0.
AThis is a growing audio dataset of spoken words in 50fifty languages for academic research and commercial applications in keyword spotting and spoken term search. The dataset contains more than 340,000 keywords, totaling 23.4 million 1-second spoken examples (over 6,000 hours). The Multilingual Spoken Words Corpus is licensed under CC-BY 4.0.
OnePeople's Speech is one of the world’s largest English speech recognition datasets, including 30,000+ hours of transcribed speech in English languages with a diverse set of speakers. The People's Speech dataset is large enough to train speech-to-text systems and is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0.
The foundations of MLCommons started with the MLPerf benchmarks in 2018 that established industry-standard metrics to measure machine learning performance and quickly grew to encompass data sets and best practices. The community behind the MLPerf benchmarks included members from every continent and grew to over 70seventy supporting organizations, fromincluding software startupsstart-ups, to researchers at top universities, and to cloud computing and semiconductor giants. MLCommons grew out of this effort, and the consortium formed on December 3, 2020.
MLCommons is a collaborative open engineering consortium, focused on developing the AI ecosystem through benchmarks, public datasets, and research.
MLCommons is a collaborative open engineering consortium, focused on developing the AI ecosystem through benchmarks, public datasets, and research. MLCommons mission is to accelerate machine learning (ML) innovation and increase its positive impact on society. While AI and ML have been around for decades, the technology is often fragmented, bespoke, and poorly understood. MLCommons aims to unlock the next stage of ML adoption by creating useful measures of quality and performance, large-scale open data sets, and common development practices and resources. MLCommons believes their efforts will help democratize ML and enable its widespread adoption into new products and services, growing ML from a research field into a mature industry
Founded in 2020 and headquartered in San Francisco, MLCommon's history goes back to the MLPerf benchmark in 2018. The MLPerf benchmark quickly grew into a suite of industry metrics for measuring machine learning performance and promoting transparency of machine learning techniques. Starting with over 50 founding partners, MLCommons is a community-driven and community-funded effort. Its members include startups, leading companies, academics, and non-profits from around the world. MLCommons promotes open-source and open data development, with most of its software projects available under the Apache 2.0 license and its datasets using CC-BY 4.0.
MLCommons has five key principles:
Developing ML benchmarks provides consistent measurements of accuracy, speed, and efficiency. This enables engineers to design reliable products and helps researchers compare innovations, choosing the best ideas for the future of the field. MLCommons work on benchmarks is divided into training and inference workgroups, that continue to release and develop and release benchmarks for the industry.
MLCommons releases public datasets to help academics and entrepreneurs develop new technologies and start new companies. Datasets released by MLCommons include:
A collection of images showing everyday household items from homes around the world, visually capturing the socioeconomic diversity of traditionally underrepresented populations. It includes 38,479 images collected from 63 different countries, tagged from a set of 289 possible topics. The metadata for each image includes demographic information such as region, country, and total household monthly income. Dollar Street consists of public domain data, licensed for academic, commercial, and non-commercial usage, under CC-BY and CC-BY-SA 4.0.
A growing audio dataset of spoken words in 50 languages for academic research and commercial applications in keyword spotting and spoken term search. The dataset contains more than 340,000 keywords, totaling 23.4 million 1-second spoken examples (over 6,000 hours). The Multilingual Spoken Words Corpus is licensed under CC-BY 4.0.
One of the world’s largest English speech recognition datasets, including 30,000+ hours of transcribed speech in English languages with a diverse set of speakers. The People's Speech dataset is large enough to train speech-to-text systems and is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0.
The foundations of MLCommons started with the MLPerf benchmarks in 2018 that established industry-standard metrics to measure machine learning performance and quickly grew to encompass data sets and best practices. The community behind the MLPerf benchmarks included members from every continent and grew to over 70 supporting organizations from software startups, to researchers at top universities, and to cloud computing and semiconductor giants. MLCommons grew out of this effort and the consortium formed on December 3, 2020.
MLCommons is a collaborative open engineering consortium, focused on developing the AI ecosystem through benchmarks, public datasets, and research.