
This push for naturalness extends to filtering out environmental distractions. Gemini 3.1 Flash Live excels at discerning relevant speech from background noise such as traffic or television, making AI agents more reliable in real-world, often noisy, environments. The model leads with a score of 90.8% on ComplexFuncBench Audio, a benchmark for multi-step function calling with various constraints. It also scored 36.1% on Scale AI’s Audio MultiChallenge, which tests complex instruction following and long-horizon reasoning amid typical human interruptions and hesitations.
The potential impact of this increased realism is substantial. As Ars Technica reports, Gemini 3.1 Flash Live's debut could blur the lines between human and AI interaction, making it harder to discern if one is conversing with a robot. Google acknowledges this challenge, integrating SynthID, an imperceptible watermark interwoven directly into the audio output. This allows for reliable detection of AI-generated content, aiming to prevent the spread of misinformation.
For everyday users, the model delivers faster and more helpful responses in Gemini Live and Search Live. It can follow a conversation's thread for twice as long as the previous model, preserving the user’s train of thought during extended discussions. This enhanced multilingual capability has enabled the global rollout of Search Live, allowing people in more than 200 countries and territories to have real-time, multimodal conversations in their preferred language. TechCrunch highlights that this expansion makes AI-powered conversational search available wherever AI Mode is supported, including real-time translation for over 70 languages on any pair of headphones.
For Developers
Leverage the Gemini Live API to create highly responsive, voice-first AI agents that handle complex, multi-step function calls with 90.8% reliability, significantly improving user experience in noisy environments.
For Enterprises
Integrate Gemini 3.1 Flash Live into customer experience platforms to deploy AI agents that accurately recognize acoustic nuances and adapt their tone, enhancing customer satisfaction and efficiency.
For Users
Expect faster, more natural, and longer-lasting AI conversations within Gemini Live and Search Live. Utilize Search Live's global expansion for real-time visual assistance and translations in over 200 countries.
For Everyone
Be aware that AI-generated audio is becoming increasingly lifelike. While Google implements SynthID watermarking for detection, maintaining a critical perspective on audio content is crucial.
Gemini 3.1 Flash Live is Google's new, high-quality audio model designed for real-time, natural-sounding AI conversations. It reduces latency, filters background noise, and understands acoustic nuances to create more human-like interactions. This model powers Gemini Live and Search Live, expanding access to real-time multimodal AI assistance globally.
Gemini 3.1 Flash Live improves AI conversations by reducing the delay between speaking and hearing a response, making interactions more fluid. It also filters out environmental distractions like traffic or television noise, focusing on relevant speech. The model scored 90.8% on ComplexFuncBench Audio, demonstrating its ability to handle multi-step function calls with constraints.
SynthID is an imperceptible audio watermark integrated into Gemini 3.1 Flash Live's audio output to detect AI-generated content. This watermark helps prevent the spread of misinformation by allowing reliable identification of AI-generated audio. Google acknowledges the challenge of distinguishing between human and AI interaction and uses SynthID to address it.
Gemini 3.1 Flash Live is available globally in over 200 countries through Gemini Live and Search Live. Developers can access it in preview via the Gemini Live API in Google AI Studio to build voice agents. Enterprises can also use it in Gemini Enterprise for Customer Experience.
Gemini 3.1 Flash Live excels in complex audio tasks, achieving a score of 90.8% on the ComplexFuncBench Audio benchmark. It also scored 36.1% on Scale AI’s Audio MultiChallenge, which tests complex instruction following and long-horizon reasoning amid typical human interruptions and hesitations. This demonstrates its ability to handle real-world conversational scenarios effectively.
More insights on trending topics and technology







