Organization attributes
BigScience is an open science project composed of hundreds of researchers around the world. It is not structured under a centralized legal entity. However, there are plans to create a legal entity for data governance and community purposes. BigScience is an open collaboration boot-strapped by HuggingFace, GENCI (Grand Equipement National de Calcul Intensif), and IDRIS (Institute for Development and Resources in Intensive Scientific Computing). Organized as a research workshop, BigScience gathers academic, industrial, and independent researchers from many affiliations. There is no formal relationship between any of the affiliated entities of the participants to the workshop.
BigScience originated from discussions in early 2021 between Thomas Wolf (HuggingFace), Stéphane Requena (GENCI), and Pierre-François Lavallee (IDRIS). GENCI and IDRIS are the two institutions behind the French supercomputer Jean Zay. Members of HuggingFace's science team (Victor Sanh, Yacine Jernite, and others) and members of the French academic and industrial AI/NLP research communities joined the discussions, leading to a grant application for 5 million compute hours on Jean Zay in February 2021. From the beginning of the project, it was planned to be an inclusive open-research initiative, welcoming participants from the international community to study questions surrounding large language models (LLMs).
The outcome of this collaboration is the multilingual 176B parameter LLM called BLOOM, a transformer-based autoregressive LLM capable of outputting coherent text in forty-six languages and thirteen programming languages.