Google DeepMind CEO Demis Hassabis has revealed plans to integrate the company’s Gemini foundation models with its Veo video-generating systems, aiming to enhance Gemini’s grasp of real-world physics through multimodal AI. Speaking on the Possible podcast, co-hosted by Reid Hoffman, Hassabis said Gemini was designed “to be multimodal from the beginning” as part of a broader vision for a universal digital assistant that can assist users in the real world.
He explained that by training on vast quantities of YouTube video data, Veo is learning “the physics of the world.” The move reflects a wider trend in the AI industry toward “omni” models capable of synthesizing text, audio, video, and images.