The revelation underscores the immediate need for content creators to adapt their strategies for the evolving search experience, especially with a new version of Google Search rolling out in 2026. This move also aligns with Google's broader strategy to integrate more conversational AI, as seen with new features announced at the recent Google I/O conference.
What the Discovery Engine Reveals
The insights into Google's AI search mechanisms come primarily from analyzing its Google Cloud Discovery Engine, which sells the underlying AI infrastructure. By understanding how this product operates, developers can infer the logic behind Google's AI Mode and AI Overviews. Metehan Yesilyurt, known for his analysis of Perplexity's ranking factors, recently detailed these findings in a blog post, with Alex Groberman highlighting them on LinkedIn.The exposed ranking signals detail a complex interplay of traditional and AI-native factors:
- Base Ranking: The initial relevance score from the core algorithm.
- Gecko Score (Embedding Similarity): Measures the semantic match between content and query using vector embeddings.
- Jetstream (Cross-Attention Relevance): An advanced model that understands nuance, negation, and context.
- BM25 Keyword Matching: Traditional keyword relevance still impacts rankings.
- PCTR (Predicted Click-Through Rate): A three-tier prediction model including popularity, PCTR, and personalized PCTR, with the latter unlocking after 100,000+ queries.
- Freshness: A scoring system based on content recency.
- Boost / Bury Rules: Manual adjustments for business logic.
How Content Retrieval Is Changing
Beyond ranking, the Discovery Engine also sheds light on the retrieval pipeline, offering crucial guidance for content creators. The system processes content in distinct units, emphasizing the need for structured, digestible information. TechCrunch reports that new AI agents, introduced at Google I/O 2026, are designed to work continuously, highlighting the demand for machine-readable clarity.Key elements of the retrieval process include:
- Content is broken into chunks, with a maximum size of 500 tokens (approximately 375 words).
- Optional ancestor headings travel with each chunk, providing context.
- Tables and images are parsed and understood.
- A layout parser combined with Gemini-enhanced understanding (LLM-augmented indexing) processes information.
Why Structured Data Is Now Critical
The exposed workings confirm that schema markup, or structured data, is more vital than ever. The Discovery Engine processes structured data with specific internal flags, a clear signal that it helps AI systems understand meaning, relationships, and entities within content. As one commenter noted, if machines are trying to understand content, providing cleaner signals through structured data simply helps them do their job.This shift signifies a departure from traditional SEO tactics focused purely on keywords and link building. Google's AI search now prioritizes content architecture, intent alignment, and machine-readable clarity. Queries to Google’s AI Mode have notably doubled every quarter since its debut a year ago, indicating rapid user adoption of these new AI-driven search experiences.








