Golden has been acquired by ComplyAdvantage.Read about it here ⟶

AudioCraft

AudioCraft is an open-source PyTorch library for audio processing and generation with deep learning, developed by Meta AI.

Overview Structured Data Issues Contributors Activity

All edits

Edits on 24 Oct, 2023

"update inverses"

Golden AI

edited on 24 Oct, 2023

Edits made to:

Infobox (+1 properties)

Infobox

Competitors

Stable Audio

Edits on 2 Oct, 2023

Jen English

edited on 2 Oct, 2023

Edits made to:

Infobox (-1 properties)

Infobox

Parent Organization

Meta

Edits on 29 Sep, 2023

Katrina-Kay Pettitt

edited on 29 Sep, 2023

Edits made to:

Infobox (-1 properties)

Infobox

Product Parent Company

Meta

"Edit from table cell"

Katrina-Kay Pettitt

edited on 29 Sep, 2023

Edits made to:

Infobox (+1 properties)

Infobox

Parent Organization

Meta

Edits on 15 Sep, 2023

Jen English

edited on 15 Sep, 2023

Edits made to:

Infobox (+1 properties)

Infobox

Is a

‌

AI Project

Edits on 8 Aug, 2023

Amy Tomlinson Gayle

edited on 8 Aug, 2023

Edits made to:

Article (+34/-6 characters)

Article

MusicGen consists of an EnCodec model for audio tokenization, an auto-regressive language model based on the transformer architecture for music modeling. MusicGen is available in three sizes (300M, 1.5B, and 3.3B parameters) and two variants (text-to-music generation tasks and melody-guided music generation). The model was evaluated using standard music benchmarks, including those below:

...

Additional qualitative studies with human participants were also used to evaluate the performance of the model based on the following criteria:

...

Version two of AudioGen released as part of AudioCraft, was trained between July 2023 and August 2023 on a range of public data sources, including the following:

...

Frechet Audio Distance and
Kullback-Leibler Divergence.

...

The decoder—turns the compressed signal back into a waveform that is as similar as possible to the original. Discriminators are used to improve the perceptual quality of the generated samples by trying to differentiate between real samples and reconstructed samples.

Arthur Smalley

edited on 7 Aug, 2023

Edits made to:

Timeline (+4 events) (+490 characters)

Article (+3/-1 characters)

Article

Version 2two of AudioGen released as part of AudioCraft, was trained between July 2023 and August 2023 on a range of public data sources, including:

Timeline

August 2, 2023

Meta AI releases AudioCraft, an open-source PyTorch library for audio processing and generation.

AudioCraft consists of three models MusicGen, AudioGen, and EnCodec.

June 8, 2023

Meta AI researchers release a paper titled "Simple and Controllable Music Generation" describing the MusicGen model.

October 25, 2022

Meta releases EnCodec, an AI model for compressing audio files.

September 30, 2022

Meta AI researchers release a paper describing version one of the AudioGen model.

The paper is titled "AudioGen: Textually Guided Audio Generation."

Arthur Smalley

edited on 7 Aug, 2023

Edits made to:

Description (+122/-154 characters)

Article (+1 images) (+5437 characters)

Further Resources (+4 rows) (+16 cells) (+699 characters)

AudioCraft

AudioCraft is a single-stop code base from Meta for generative audio, including music, sound effects, and compression after training on raw audio signals.

AudioCraft is an open-source PyTorch library for audio processing and generation with deep learning, developed by Meta AI.

Article

Overview

AudioCraft is an open-source PyTorch library for audio processing and generation with deep learning, developed by Meta AI. AudioCrafts offers users a range of generative audio capabilities (music, sound effects, and compression after training on raw audio signals) in a single code base. It consists of three models:

Diagram demonstrating how Audiocraft works.

MusicGen—text-to-music model.
AudioGen—text-to-sound model.
EnCodec—neural audio codec.

Both MusicGen and AudioGen consist of a single autoregressive Language Model (LM) operating over streams of compressed discrete music representation (tokens). Meta AI introduced an approach leveraging the internal structure of the parallel streams of tokens, showing that a token interleaving pattern can efficiently model audio sequences while also capturing long-term dependencies in the audio. These models leverage EnCodec to learn the discrete audio tokens from raw waveforms. The codec maps audio signals to one or several parallel streams of discrete tokens. Then a single autoregressive language model recursively models the tokens from EnCodec. Generated tokens are then fed to EnCodec decoder to map back to the audio space, obtaining an output waveform. Different types of conditioning models can control the generation, including a pretrained text encoder for text-to-audio applications.

AudioCraft was released on August 2, 2023. Meta AI chose to open-source the AudioCraft models allowing users to train their own models based on their own datasets. The AudioCraft code is released under the MIT license, and the model weights are released under the CC-BY-NC 4.0 license. Meta has released demos of the models demonstrating samples of audio generated from both the text-to-sound and text-to-music models. The company is aiming for the AudioCraft models to be used as tools for musicians and sound designers, helping users brainstorm new ideas or iterate on their existing compositions in new ways. Meta has also MusicGen could become a new type of instrument similar to the adoption of synthesizers.

MusicGen

The MusicGen model was first described in a paper released in June 2023 titled "Simple and Controllable Music Generation." The model was developed by the FAIR team at Meta AI and trained between April 2023 and May 2023. The training dataset consisted of roughly 400,000 recordings along with text description and metadata, amounting to 20,000 hours of music owned by Meta or licensed from the following sources the Meta Music Initiative Sound Collection, Shutterstock music collection, and the Pond5 music collection.

Frechet Audio Distance computed on features extracted from a pre-trained audio classifier (VGGish)
Kullback-Leibler Divergence on label distributions extracted from a pre-trained audio classifier (PaSST)
CLAP Score between audio embedding and text embedding extracted from a pre-trained CLAP model

Additional qualitative studies with human participants were also used to evaluate the performance of the model based on the following criteria:

Overall quality of the music samples
Text relevance to the provided text input
Adherence to the melody for melody-guided music generation

AudioGen

AudioGen was also developed by the FAIR team at Meta AI. A paper describing version one of the model was released in September 2022, titled "AudioGen: Textually Guided Audio Generation."

Version 2 of AudioGen released as part of AudioCraft, was trained between July 2023 and August 2023 on a range of public data sources, including:

A subset of AudioSet
BBC sound effects
AudioCaps
Clotho v2
VGG-Sound
FSD50K
Free To Use Sounds
Sonniss Game Effects
WeSoundEffects
Paramount Motion - Odeon Cinematic Sound Effects.

AudioGen consists of an EnCodec model for audio tokenization and an auto-regressive language model based on the transformer architecture for audio modeling. Version 2 was enhanced by training on 10-second samples vs 5 seconds (version 1), using a retrained EnCodec model on environmental sound data, and not using audio mixing augmentations. Version 2 has 1.5 billion parameters. AudioGen was evaluated using:

Frechet Audio Distance
Kullback-Leibler Divergence

Again, qualitative studies with human participants were also undertaken.

EnCodec

EnCodec was first released by Meta AI on October 25, 2022. The model was described in a paper titled "High Fidelity Neural Audio Compression." Encodec consists of three parts:

The encoder—takes uncompressed data and transforms it into a higher dimensional and lower frame rate representation.
The quantizer—compresses this representation to the targeted size. The quantizer is trained to give the desired size (or set of sizes) while retaining the most important information to rebuild the original signal. This compressed representation is stored on disk or sent through the network.
The decoder—turns the compressed signal back into a waveform that is as similar as possible to the original. Discriminators are used to improve the perceptual quality of the generated samples by trying to differentiate between real samples and reconstructed samples

Further Resources

Title

Author

Link

Type

Date

AudioGen: Textually Guided Audio Generation

Felix Kreuk, Gabriel Synnaeve, Adam Polyak, Uriel Singer, Alexandre Défossez, Jade Copet, Devi Parikh, Yaniv Taigman, Yossi Adi

https://arxiv.org/abs/2209.15352

September 30, 2022

High Fidelity Neural Audio Compression

Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

https://arxiv.org/abs/2210.13438

October 24, 2022

Simple and Controllable Music Generation

Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez

https://arxiv.org/abs/2306.05284

June 8, 2023

Using AI to compress audio files for quick and easy sharing

https://ai.meta.com/blog/ai-powered-audio-compression-technique/

Web

October 25, 2022

Edits on 4 Aug, 2023

Jen English

edited on 4 Aug, 2023

Edits made to:

Infobox (+1 properties)

Infobox

Industry

Sound effect

Jen English

edited on 4 Aug, 2023

Edits made to:

Infobox (+5 properties)

Description (+154 characters)

‌

AudioCraft

AudioCraft is a single-stop code base from Meta for generative audio, including music, sound effects, and compression after training on raw audio signals.

Infobox

Announcement URL

https://about.fb.com/news/2023/08/audiocraft-generative-ai-for-music-and-audio/

GitHub URL

https://github.com/facebookresearch/audiocraft

Industry