Tuesday, November 5, 2024

Stability AI releases a sound generator

Must read

Stability AI, the startup behind the AI-powered art generator Stable Diffusion, has released an open AI model for generating sounds and songs that it claims was trained exclusively on royalty-free recordings.

Called Stable Audio Open, the generative model takes a text description (e.g. “Rock beat played in a treated studio, session drumming on an acoustic kit”) and outputs a recording up to 47 seconds in length. The model was trained using around 486,000 samples from free music libraries FreeSound and the Free Music Archive.

Stability AI says that the model can be used to create drum beats, instrument riffs, ambient noises and “production elements” for videos, films and TV shows as well as to “edit” existing songs or apply the style of one song (e.g. smooth jazz) to another.

“A key benefit of this open source release is that users can fine-tune the model on their own custom audio data,” Stability AI wrote in a post on its corporate blog. “For example, a drummer could fine-tune on samples of their own drum recordings to generate new beats.”

Stable Audio Open has its limitations, however. It can’t produce full songs, melodies or vocals — at least not good ones. Stability AI says that it’s not optimized for this, and suggests that users looking for those capabilities opt for the company’s premium Stable Audio service.

Stable Audio Open also can’t be used commercially; its terms of service prohibit it. And it doesn’t perform equally well across musical styles and cultures or with description in languages other than English — biases Stability AI blames on the training data.

“The source of data is potentially lacking diversity and all cultures are not equally represented in the data set,” Stability AI writes in a description of the model. “The generated samples from the model will reflect the biases from the training data.”

Stability AI — which has long struggled to turn its flagging business around — became the subject of controversy recently after its VP of generative audio, Ed Newton-Rex, resigned over disagreement with the company’s stance that training generative AI models on copyrighted works constitutes “fair use.” Stable Audio Open would appear to be an attempt to turn that narrative around, while at the same time not-so-subtly advertising Stability AI’s paid products.

Latest article