Organization attributes
Other attributes
MLCommons is a collaborative open engineering consortium focused on developing the AI ecosystem through benchmarks, public datasets, and research. MLCommons' mission is to accelerate machine learning (ML) innovation and increase its positive impact on society. While AI and ML have been around for decades, the technology is often fragmented, bespoke, and poorly understood. MLCommons aims to unlock the next stage of ML adoption by creating useful measures of quality and performance, large-scale open data sets, and common development practices and resources. MLCommons believes its efforts will help democratize ML and enable its widespread adoption into new products and services, growing ML from a research field into a mature industry.
Founded in 2020 and headquartered in San Francisco, MLCommon's history goes back to the MLPerf benchmark in 2018. The MLPerf benchmark quickly grew into a suite of industry metrics for measuring machine learning performance and promoting transparency of machine learning techniques. Starting with over fifty founding partners, MLCommons is a community-driven and community-funded effort. Its members include start-ups, leading companies, academics, and nonprofits from around the world. MLCommons promotes open-source and open data development, with most of its software projects available under the Apache 2.0 license and its datasets using CC-BY 4.0.
MLCommons has five key principles:
- Grow ML markets and make the world a better place
- Get everyone involved (be global, inclusive, and fair; bring together academia, small companies, large companies, non-profits, etc; make it easy to get involved; be as open with its IP as possible while sustaining the community)
- Act through collaborative engineering (keep leadership mostly technical, with an emphasis on hands-on-involvement; favor data-driven decisions, design simplicity, and focus on real user value)
- Make fast but consensus-supported decisions (very low barrier for “experimental” working groups with well-reviewed path to full endorsement; favor grudging consensus over 51/49 votes, especially for big decisions; make technical contributions easy; favor rapid development and iteration)
- Build a community that people want to be part of (be welcoming, informal, and friendly; encourage, recognize, and reward contributions; celebrate with cake)
Developing ML benchmarks provides consistent measurements of accuracy, speed, and efficiency. This enables engineers to design reliable products and helps researchers compare innovations, choosing the best ideas for the future of the field. MLCommons' work on benchmarks is divided into training and inference work groups that continue to develop and release benchmarks for the industry.
MLCommons releases public datasets to help academics and entrepreneurs develop new technologies and start new companies. Datasets released by MLCommons include the following:
Dollar Street is a collection of images showing everyday household items from homes around the world, visually capturing the socioeconomic diversity of traditionally underrepresented populations. It includes 38,479 images collected from 63 different countries, tagged from a set of 289 possible topics. The metadata for each image includes demographic information such as region, country, and total household monthly income. Dollar Street consists of public domain data, licensed for academic, commercial, and non-commercial usage, under CC-BY and CC-BY-SA 4.0.
This is a growing audio dataset of spoken words in fifty languages for academic research and commercial applications in keyword spotting and spoken term search. The dataset contains more than 340,000 keywords, totaling 23.4 million 1-second spoken examples (over 6,000 hours). The Multilingual Spoken Words Corpus is licensed under CC-BY 4.0.
People's Speech is one of the world’s largest English speech recognition datasets, including 30,000+ hours of transcribed speech in English languages with a diverse set of speakers. The People's Speech dataset is large enough to train speech-to-text systems and is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0.
The foundations of MLCommons started with the MLPerf benchmarks in 2018 that established industry-standard metrics to measure machine learning performance and quickly grew to encompass data sets and best practices. The community behind the MLPerf benchmarks included members from every continent and grew to over seventy supporting organizations, including software start-ups, researchers at top universities, and cloud computing and semiconductor giants. MLCommons grew out of this effort, and the consortium formed on December 3, 2020.