Summarize

Stability AI launches enhanced audio generation model, Stable Audio 2.0

By Dwaipayan Roy

Apr 03, 2024

06:29 pm

What's the story

Stability AI has introduced Stable Audio 2.0, an upgraded version of its audio generation model. This new iteration allows users to create AI-generated songs that can last up to three minutes, doubling the maximum duration of its predecessor. The original version, launched in September 2023, was limited to generating sound clips of only 90 seconds. The latest release mirrors the typical length of most radio-friendly tunes.

Enhancement

Stable Audio 2.0: Accessible and enhanced song structure

Unlike OpenAI's audio generation tool, Voice Engine, which is limited to select users, Stability AI's Stable Audio 2.0 is freely available to the public, via its website and soon through its API. The company emphasizes that a significant improvement in Stable Audio 2.0 over its predecessor, is the ability to generate songs with a complete structure including an intro, progression, and an outro.

User experience

Quality and customization of AI-generated music

The quality of the AI-created music from Stable Audio 2.0 has sparked debate. A journalist from The Verge noted that while some parts of the generated song were playlist-worthy, others resembled "whale sounds." However, users have the option to customize their projects by adjusting prompt strength, and checking how much of the uploaded audio will be altered. Sound effects such as crowd cheers or keyboard clicks can also be added.

Industry perspective

AI audio generation: A challenge for tech giants

The challenge of creating AI-generated music that doesn't sound strange or lack soul isn't exclusive to Stability AI. Other tech giants like Google and Meta have also been experimenting with AI audio generation. However, unlike Stability AI, these companies have not yet made their models publicly accessible, as they continue to gather developer feedback on this issue.

Stability AI's approach to copyright and training data

Stable Audio's training data is sourced from AudioSparx, with a library of over 800,000 audio files. Artists affiliated with AudioSparx were given the option to exclude their material from being used to train the model. To prevent copyright violations, Stability AI has joined hands with Audible Magic to use its content recognition technology for monitoring and preventing copyrighted material from being uploaded onto the platform.