Ask an AI search engine the same question twice and you can get two different answers back. Not because the model changed its mind, but because randomness is designed into nearly every stage of the pipeline between your prompt and its response. Not a glitch. A design decision, made six times over. Here is where each one happens.

User prompt · example

"I need a new pair of Nike running shoes for a middle aged tall man with good arch support."

1
Step 01
Query understanding & decomposition

The model parses intent, entities, and context:

  • Entity: Nike (brand)
  • Category: running shoes
  • Attributes: middle aged, tall man, good arch support
2
Step 02
Query fan-out: sub-searches generated

The model generates multiple retrieval queries:

best Nike running shoes for tall men
Nike shoes with arch support for overpronation
Nike men's size 13 running shoes review
3
Step 03
Retrieval: the RAG pipeline

Each fan-out query hits the search index:

  • Web pages are chunked (segmented)
  • Chunks are embedded (converted to vectors)
  • Top-K most semantically similar chunks are retrieved
4
Step 04
Context assembly

Retrieved chunks are injected into the context window:

  • System prompt
  • Retrieved chunks (ranked by relevance)
  • User prompt
5
Step 05 · the core
Token generation

The model predicts the next token based on the assembled context, one token at a time.

● Token random core
  • Outputs a probability distribution over the entire vocabulary for the next token
  • The top token (greedy) is rarely the only choice
  • Temperature / Top-P sampling selects from the distribution
P("Pegasus")
0.4
P("Vomero")
0.3
P("ZoomX")
0.2

Same query, temperature = 0.7 → “Pegasus” · temperature = 0.9 → “Vomero”.

6
Step 06
Final response

"Based on your requirements, the Nike Air Zoom Structure 24 offers excellent arch support and is available in extended sizes…"

None of this is a bug you can patch. It is the architecture. Six separate places inject their own variance: query parsing, fan-out, retrieval, context assembly, sampling, and citation. And the variance compounds. That is why chasing a single "AI search ranking" the way you would chase a Google position is the wrong mental model. There is no rank to hold, only odds: of being retrieved, assembled, and sampled into the answer often enough to matter. You are not optimising for a position. You are loading dice.