How AI Search Works: Why It's Randomised

Ask an AI search engine the same question twice and you can get two different answers back. Not because the model changed its mind, but because randomness is designed into nearly every stage of the pipeline between your prompt and its response. Not a glitch. A design decision, made six times over. Here is where each one happens.

User prompt · example

"I need a new pair of Nike running shoes for a middle aged tall man with good arch support."

Step 01

Query understanding & decomposition

The model parses intent, entities, and context:

Entity: Nike (brand)
Category: running shoes
Attributes: middle aged, tall man, good arch support

Step 02

Query fan-out: sub-searches generated

The model generates multiple retrieval queries:

best Nike running shoes for tall men

Nike shoes with arch support for overpronation

Nike men's size 13 running shoes review

Step 03

Retrieval: the RAG pipeline

Each fan-out query hits the search index:

Web pages are chunked (segmented)
Chunks are embedded (converted to vectors)
Top-K most semantically similar chunks are retrieved

Step 04

Context assembly

Retrieved chunks are injected into the context window:

System prompt
Retrieved chunks (ranked by relevance)
User prompt

Step 05 · the core

Token generation

The model predicts the next token based on the assembled context, one token at a time.

● Token random core

Outputs a probability distribution over the entire vocabulary for the next token
The top token (greedy) is rarely the only choice
Temperature / Top-P sampling selects from the distribution

P("Pegasus")

0.4

P("Vomero")

0.3

P("ZoomX")

0.2

Same query, temperature = 0.7 → “Pegasus” · temperature = 0.9 → “Vomero”.

Step 06

Final response

"Based on your requirements, the Nike Air Zoom Structure 24 offers excellent arch support and is available in extended sizes…"

What this means if you're optimising for AI search

None of this is a bug you can patch. It is the architecture. Six separate places inject their own variance: query parsing, fan-out, retrieval, context assembly, sampling, and citation. And the variance compounds. That is why chasing a single "AI search ranking" the way you would chase a Google position is the wrong mental model. There is no rank to hold, only odds: of being retrieved, assembled, and sampled into the answer often enough to matter. You are not optimising for a position. You are loading dice.

How AI Search Works: Why It's Randomised

Why the Same Query Returns Different Answers: Randomness in AI Search

What this means if you're optimising for AI search

You might also like

The Same URL, Two Different Minds: Why Agentic SEO Leaves Chained Prompts Behind

What is agentic AI, and why should anyone care?

Why ChatGPT Cites Your Content (And Why Your Schema Doesn't Matter)