Per Google's official blog, Gemini Omni is introduced as a new multimodal model family and the company released the first member, Gemini Omni Flash, targeted at video generation and editing; Google says the model is rolling out to the Gemini app, Google Flow, and YouTube Shorts (Google blog).
The DeepMind product page and Google's announcement describe capabilities including multi-turn, conversational editing, maintained character and scene consistency, and the ability to combine images, audio, video, and text as inputs for a single output (DeepMind; Google blog).
Both DeepMind and Google's blog state that content created or edited with Omni includes an imperceptible digital watermark and that the model underwent automated and human red teaming plus ethics and safety reviews ahead of release (DeepMind; Google blog).
The Verge's hands-on review reports that Omni Flash produces plausible travel and action scenes with minimal user effort but also produces artifacts and occasional coherence failures in complex scenes (The Verge).
Per Google's official blog, Gemini Omni is introduced as a new multimodal model family and the company released the first member, Gemini Omni Flash, targeted at video generation and editing; Google says the model is rolling out to the Gemini app, Google Flow, and YouTube Shorts (Google blog). The DeepMind product page and Google's announcement describe capabilities including multi-turn, conversational editing, maintained character and scene consistency, and the ability to combine images, audio, video, and text as inputs for a single output (DeepMind; Google blog). Both DeepMind and Google's blog state that content created or edited with Omni includes an imperceptible digital watermark and that the model underwent automated and human red teaming plus ethics and safety reviews ahead of release (DeepMind; Google blog). The Verge's hands-on review reports that Omni Flash produces plausible travel and action scenes with minimal user effort but also produces artifacts and occasional coherence failures in complex scenes (The Verge).