NVIDIA Unveils Fugatto AI Model to Modify Voices and Create Unique Audio for Media Production

AI, Business, Insights, News, Product Release, Slider

NVIDIA introduced a new artificial intelligence model earlier this week, called Fugatto (short for Foundational Generative Audio Transformer Opus 1), designed to generate music and audio, modify voices, and create novel sounds. The technology, targeted at professionals in music, film, and video game production, is not set for immediate public release, the company stated.

Fugatto distinguishes itself from other AI technologies by not only generating audio from text descriptions but also modifying existing audio. For instance, it can transform a piano melody into a vocal line or alter a spoken word recording’s accent and mood. It can even create unique sounds, such as a trumpet that barks like a dog.

Bryan Catanzaro, vice president of applied deep learning research at NVIDIA, remarked: “If we think about synthetic audio over the past 50 years, music sounds different now because of computers, because of synthesizers. I think that generative AI is going to bring new capabilities to music, to video games, and to ordinary folks that want to create things.”

This development aligns with trends in generative AI technologies showcased by companies like Runway and Meta Platforms (META.O), which are also exploring audio and video generation from text prompts. NVIDIA’s move highlights its role as the world’s largest supplier of chips and software for AI systems.

Despite Fugatto’s potential, NVIDIA emphasized its cautious approach to releasing the model. Catanzaro explained: “Any generative technology always carries some risks, because people might use that to generate things that we would prefer they don’t. We need to be careful about that, which is why we don’t have immediate plans to release this.”

The model was trained using open-source data, but NVIDIA continues to deliberate on the ethical considerations of public deployment. The risks of misuse, such as generating misinformation or infringing copyrights, remain significant challenges for creators of generative AI models.

The announcement comes amid growing tension between the tech and entertainment industries, particularly around the use of AI in creative fields. While companies like OpenAI negotiate with Hollywood studios over the application of AI in entertainment, the industry faces concerns over copyright and voice imitation, exemplified by recent allegations from actress Scarlett Johansson against OpenAI.

NVIDIA’s Fugatto marks a step forward in AI-driven creativity, but the company’s restrained approach underscores the ongoing debate around the ethical implications of generative AI.