AudioCraft is an open-source PyTorch library for audio processing and generation with deep learning, developed by Meta AI.
MusicGen consists of an EnCodec model for audio tokenization, an auto-regressive language model based on the transformer architecture for music modeling. MusicGen is available in three sizes (300M, 1.5B, and 3.3B parameters) and two variants (text-to-music generation tasks and melody-guided music generation). The model was evaluated using standard music benchmarks, including those below:
Additional qualitative studies with human participants were also used to evaluate the performance of the model based on the following criteria:
Version two of AudioGen released as part of AudioCraft, was trained between July 2023 and August 2023 on a range of public data sources, including the following:
Version 2two of AudioGen released as part of AudioCraft, was trained between July 2023 and August 2023 on a range of public data sources, including:
August 2, 2023
AudioCraft consists of three models MusicGen, AudioGen, and EnCodec.
June 8, 2023
October 25, 2022
September 30, 2022
The paper is titled "AudioGen: Textually Guided Audio Generation."
AudioCraft is a single-stop code base from Meta for generative audio, including music, sound effects, and compression after training on raw audio signals.
AudioCraft is an open-source PyTorch library for audio processing and generation with deep learning, developed by Meta AI.
AudioCraft is an open-source PyTorch library for audio processing and generation with deep learning, developed by Meta AI. AudioCrafts offers users a range of generative audio capabilities (music, sound effects, and compression after training on raw audio signals) in a single code base. It consists of three models:
Both MusicGen and AudioGen consist of a single autoregressive Language Model (LM) operating over streams of compressed discrete music representation (tokens). Meta AI introduced an approach leveraging the internal structure of the parallel streams of tokens, showing that a token interleaving pattern can efficiently model audio sequences while also capturing long-term dependencies in the audio. These models leverage EnCodec to learn the discrete audio tokens from raw waveforms. The codec maps audio signals to one or several parallel streams of discrete tokens. Then a single autoregressive language model recursively models the tokens from EnCodec. Generated tokens are then fed to EnCodec decoder to map back to the audio space, obtaining an output waveform. Different types of conditioning models can control the generation, including a pretrained text encoder for text-to-audio applications.
AudioCraft was released on August 2, 2023. Meta AI chose to open-source the AudioCraft models allowing users to train their own models based on their own datasets. The AudioCraft code is released under the MIT license, and the model weights are released under the CC-BY-NC 4.0 license. Meta has released demos of the models demonstrating samples of audio generated from both the text-to-sound and text-to-music models. The company is aiming for the AudioCraft models to be used as tools for musicians and sound designers, helping users brainstorm new ideas or iterate on their existing compositions in new ways. Meta has also MusicGen could become a new type of instrument similar to the adoption of synthesizers.
The MusicGen model was first described in a paper released in June 2023 titled "Simple and Controllable Music Generation." The model was developed by the FAIR team at Meta AI and trained between April 2023 and May 2023. The training dataset consisted of roughly 400,000 recordings along with text description and metadata, amounting to 20,000 hours of music owned by Meta or licensed from the following sources the Meta Music Initiative Sound Collection, Shutterstock music collection, and the Pond5 music collection.
MusicGen consists of an EnCodec model for audio tokenization, an auto-regressive language model based on the transformer architecture for music modeling. MusicGen is available in three sizes (300M, 1.5B, and 3.3B parameters) and two variants (text-to-music generation tasks and melody-guided music generation). The model was evaluated using standard music benchmarks, including:
Additional qualitative studies with human participants were also used to evaluate the performance of the model based on the following criteria:
AudioGen was also developed by the FAIR team at Meta AI. A paper describing version one of the model was released in September 2022, titled "AudioGen: Textually Guided Audio Generation."
Version 2 of AudioGen released as part of AudioCraft, was trained between July 2023 and August 2023 on a range of public data sources, including:
AudioGen consists of an EnCodec model for audio tokenization and an auto-regressive language model based on the transformer architecture for audio modeling. Version 2 was enhanced by training on 10-second samples vs 5 seconds (version 1), using a retrained EnCodec model on environmental sound data, and not using audio mixing augmentations. Version 2 has 1.5 billion parameters. AudioGen was evaluated using:
Again, qualitative studies with human participants were also undertaken.
EnCodec was first released by Meta AI on October 25, 2022. The model was described in a paper titled "High Fidelity Neural Audio Compression." Encodec consists of three parts:
AudioCraft is a single-stop code base from Meta for generative audio, including music, sound effects, and compression after training on raw audio signals.
AudioCraft is an open-source PyTorch library for audio processing and generation with deep learning, developed by Meta AI.
AudioCraft is an open-source PyTorch library for audio processing and generation with deep learning, developed by Meta AI.