Meta introduces AudioCraft, a significant advancement in AI-driven audio innovation. This technology allows users to create high-quality audio and music from simple text inputs.
The AudioCraft toolkit consists of three models – MusicGen, AudioGen, and EnCodec. MusicGen translates text prompts into musical compositions using Meta-owned and licensed music. AudioGen, on the other hand, converts text prompts into various audio elements using publicly available sound effects.
An essential aspect of this innovation is the upgraded EnCodec decoder, promising better music generation quality with fewer artifacts.
Meta is also releasing pre-trained AudioGen models for generating environmental sounds and effects like dogs barking, car horns honking, and footsteps on wooden floors. They generously share all AudioCraft model weights and code and take the noteworthy step of open-sourcing these models, empowering practitioners to explore AI-generated audio and music with their datasets.
AudioCraft simplifies the architecture of generative audio models and encourages customization and innovation. Users can explore new models, empowering them to push the boundaries of audio and music generation.
This innovation streamlines creative endeavors related to music, sound, compression, and generation. AudioCraft’s user-friendly nature encourages the development of improved sound generators, compression algorithms, and music composition tools, fostering collaboration within the audio innovation community.
Meta envisions the AudioCraft models as an inspiration for musicians and sound designers, facilitating rapid ideation and iteration in novel ways.