Categories: Technology

Google DeepMind’s new AI tech will generate soundtracks for videos

Google’s DeepMind artificial intelligence laboratory is working on a new technology that can generate soundtracks, even dialogue, to go along with videos. The lab has shared its progress on the video-to-audio (V2A) technology project, which can be paired with Google Veo and other video creation tools like OpenAI’s Sora. In its blog post, the DeepMind team explains that the system can understand raw pixels and combine that information with text prompts to create sound effects for what’s happening onscreen. To note, the tool can also be used to make soundtracks for traditional footage, such as silent films and any other video without sound.

DeepMind’s researchers trained the technology on videos, audios and AI-generated annotations that contain detailed descriptions of sounds and dialogue transcripts. They said that by doing so, the technology learned to associate specific sounds with visual scenes. As TechCrunch notes, DeepMind’s team isn’t the first to release an AI tool that can generate sound effects — ElevenLabs released one recently, as well — and it won’t be the last. “Our research stands out from existing video-to-audio solutions because it can understand raw pixels and adding a text prompt is optional,” the team writes.

While the text prompt is optional, it can be used to shape and refine the final product so that it’s as accurate and as realistic as possible. You can enter positive prompts to steer the output towards creating sounds you want, for instance, or negative prompts to steer it away from the sounds you don’t want. In the sample below, the team used the prompt: “Cinematic, thriller, horror film, music, tension, ambience, footsteps on concrete.

The researchers admit that they’re still trying to address their V2A technology’s existing limitations, like the drop in the output’s audio quality that can happen if there are distortions in the source video. They’re also still working on improving lip synchronizations for generated dialogue. In addition, they vow to put the technology through “rigorous safety assessments and testing” before releasing it to the world.

This article contains affiliate links; if you click such a link and make a purchase, we may earn a commission.

http://www.engadget.com/rss.xml

Mariella Moon

Mariella Moon

Share
Published by
Mariella Moon

Recent Posts

Amazon is getting rid of plastic air pillows in North America

/ The e-commerce giant says it has already replaced 95 percent of plastic air packaging…

2 weeks ago

OpenAI co-founder’s new company promises ‘Safe Superintelligence’ – a laughably impossible dream

Pro-tip for anyone naming a new company, especially in an area as fraught as AI:…

2 weeks ago

George Kittle reveals shocking 30-pound weight loss while recovering from injuries

Star San Francisco tight end George Kittle revealed on a recent podcast appearance that he…

2 weeks ago

Joe is effective BECAUSE he’s old!

https://www.youtube.com/watch?v=bPzbq_Bu0GY

2 weeks ago

Senior Post Office worker admits deleting reference to Horizon ‘system failure’ from witness statement later used to wrongly prosecute a postmaster jailed for nine years

Graham Ward admitted editing testimony used to prosecute a postmasterHe removed references to 'bugs' in…

2 weeks ago

Inside the players and politics of the modern AI industry

/ Guest host Alex Heath sits down with reporter Kylie Robison to discuss what it’s…

2 weeks ago