The SEO industry has a persistent problem: it conflates technical requirements with ranking factors, and both with web design best practices. A website can be poorly structured, have messy HTML, lack proper heading hierarchy, and still rank well in Google. Conversely, a beautifully designed, semantically perfect website with no original content will never rank. The distinction matters enormously because it determines where you should invest your effort.

This article separates what Google actually requires from what actually drives rankings, grounded entirely in primary source documentation from Google Search Central and official guidelines. Every recommendation is verifiable and backed by Google's own words.

What Google Actually Requires for Indexation

Google's technical requirements are remarkably minimal. According to Google Search Central, there are only three conditions a page must meet to be eligible for indexation [1]:

First, Googlebot must not be blocked. The page must be publicly accessible and not blocked by robots.txt, login requirements, or other access restrictions. Second, the page must return an HTTP 200 (success) status code. Error pages (4xx, 5xx) are not indexed. Third, the page must have indexable content in a file type Google supports, and it must not violate Google's spam policies [1].

That is the entire technical requirement. Google explicitly states: "There are actually very few technical things you need to do to a web page; most sites pass the technical requirements without even realising it" [1].

This means a page can have no H1 tag, messy HTML, poor heading hierarchy, and still meet these requirements. Google is remarkably capable of crawling and understanding poorly structured pages. The absence of semantic HTML, heading hierarchy, or clean code is not a technical violation; it is a web design problem, not an SEO problem.

What Actually Drives Search Rankings

Once a page meets the technical eligibility requirements, Google's ranking systems determine whether and how prominently it appears in search results. The primary ranking factors are fundamentally different from technical requirements.

Content Quality and Helpfulness

Google's automated ranking systems are designed to prioritise helpful, reliable information created to benefit people, not content created to manipulate search engine rankings [2]. This is the single most important ranking factor.

Content must provide original information, reporting, research, or analysis. If drawing on other sources, it must avoid simply copying or rewriting, instead providing substantial additional value [2]. The content should provide a comprehensive description of the topic, avoiding thin or superficial treatments. Avoid producing lots of content on many different topics in the hope that some might perform well, and do not write to a specific word count under the mistaken belief that Google has a preferred length [2].

Query Relevance and Keyword Placement

Google uses the words people search for to understand page relevance. Placing these words in prominent locations helps Google understand the page's topic [3].

Use descriptive, keyword-relevant titles and main headings that clearly communicate the page's topic. Place relevant keywords in alt text, link text, and other descriptive locations where they naturally fit [3]. However, keyword stuffing (repeating keywords unnaturally or in lists) is a spam violation and will harm rankings [4].

E-E-A-T Signals for YMYL Content Only

This is where the industry creates the most confusion. Google's automated systems identify factors that can help determine which content demonstrates Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T). Critically, E-E-A-T receives heightened weight only for "Your Money or Your Life" (YMYL) topics—content that could significantly impact health, financial stability, safety, or societal welfare [2].

For non-YMYL content, E-E-A-T is not a primary ranking factor. Instead, the focus should remain on creating helpful, reliable, people-first content that serves the user's intent.

For YMYL content, it should be self-evident to visitors who authored the content. Pages should carry bylines that lead to further information about the author's background and expertise [2]. It is helpful to readers to know how a piece of content was produced. For example, product reviews should explain the testing methodology and provide evidence of the work involved [2]. The content must be created primarily to help people; using automation or AI-generation to produce content for the primary purpose of manipulating search rankings is a violation of Google's spam policies [2].

For all content, focus on creating content that is original, comprehensive, well-written, and free of factual errors. Content should be produced with care and attention, not mass-produced or outsourced without quality oversight [2].

Links remain a fundamental signal used by Google to discover new pages and determine the relevancy of content [5]. The vast majority of new pages Google finds every day are discovered through links [3].

Google can generally only crawl links if they are an <a> HTML element with an href attribute. Links relying on script events (e.g., onclick) without an href cannot be reliably extracted [5]. Anchor text must be descriptive, reasonably concise, and relevant to both the current page and the linked page. Avoid generic text like "click here" or "read more" [5]. Every important page on the site should have at least one internal link pointing to it from another page [5].

External links should represent real references and earned credibility. Buying or selling links for ranking purposes, excessive link exchanges, or using automated programs to create links are explicit violations of Google's spam policies [4]. Use the rel="nofollow" attribute when linking to a source you do not trust. Use rel="sponsored" for paid links or advertisements, and rel="ugc" for user-generated content links [5].

Mobile-First Indexing: The Mobile Version Is the Primary Version

Google uses the mobile version of a site's content for indexing and ranking [6]. This is not a secondary consideration; it is the primary indexing method.

Ensure that the mobile site contains the exact same primary content as the desktop site. Differences in the Document Object Model (DOM) or layout can result in Google understanding the content differently [6]. The <title> element and meta descriptions must be equivalent across both mobile and desktop versions [6]. Do not block Googlebot from crawling essential resources (CSS, JavaScript, images) using the robots.txt file [6].

Artificial Intelligence and Agentic Engine Optimisation

The integration of generative AI into search results (e.g., AI Overviews) relies on the same core Search ranking systems used to retrieve relevant, up-to-date pages [7]. This means optimising for AI search does not require a separate strategy.

AI systems retrieve and synthesise content based heavily on structural signals. However, this does not mean creating artificial structures for AI; rather, it means ensuring that content structure reflects the actual information hierarchy [7]. Implement structured data (Schema.org) to provide explicit clues about the meaning of a page. This is a machine-readable layer that AI systems use to understand and extract information. Google recommends JSON-LD format [8].

Organise content with clear headings and paragraphs. Content structured as clearly labelled questions and answers maps directly to how AI systems extract and synthesise responses [7]. Do not treat AI search as a separate channel requiring special files (e.g., llms.txt), artificial chunking, or inauthentic mentions. Optimise for AI by strengthening core crawlability, indexability, and content usefulness [7].

SEO Strategy Build a ranking strategy grounded in what actually drives visibility. Learn the difference between technical requirements, ranking factors, and web design best practices.

Explore SEO services

What to Avoid: Spam Policies

Google's spam policies detail the behaviours and tactics that can lead to lower ranking or complete removal from search results [4].

Cloaking (presenting different content to users and search engines with the intent to manipulate rankings) is a violation [4]. Doorway abuse (creating multiple pages or sites designed to rank for specific queries and funnel users to a final destination) is prohibited [4]. Hidden text and link abuse (placing content on a page solely to manipulate search engines using CSS, HTML, or other techniques) will result in penalties [4]. Keyword stuffing (filling a page with keywords or numbers in an attempt to manipulate rankings) is a violation [4]. Link spam (buying or selling links, excessive link exchanges, or using automated programs to create links) will harm your site [4].

The Critical Distinction: Requirements vs. Ranking Factors vs. Best Practices

The SEO industry often treats these three categories as equivalent, but they are not. Understanding the distinction is essential for prioritising your effort effectively.

Technical requirements are the bare minimum for eligibility. A page must be accessible to Googlebot, return a 200 status code, and contain indexable content. Meeting these requirements does not guarantee indexation or ranking; it only makes the page eligible.

Ranking factors are the signals that determine whether and how prominently a page appears in search results. Content quality, keyword relevance, links, and mobile-first indexing parity are primary ranking factors. E-E-A-T signals are ranking factors only for YMYL content.

Best practices are web design and user experience improvements that do not directly impact rankings but contribute to overall site quality. Responsive design, heading hierarchy (H1-H6), clean HTML structure, descriptive URLs, and page speed are best practices. They make a site more maintainable, more usable, and more professional, but they are not ranking factors.

A common mistake is investing heavily in heading hierarchy, semantic HTML, and site structure in the belief that these are ranking factors. They are not. A page with no H1 tag and messy HTML can rank well if it has original, helpful content and earns links. Conversely, a perfectly structured page with no original content will never rank.

The most impactful investments are in content quality, keyword relevance, and link acquisition. These are the primary ranking factors. Invest in web design best practices after you have addressed the ranking factors.

References

[1] Google Search Central. (n.d.). Google Search technical requirements. Retrieved from https://developers.google.com/search/docs/essentials/technical

[2] Google Search Central. (n.d.). Creating helpful, reliable, people-first content. Retrieved from https://developers.google.com/search/docs/fundamentals/creating-helpful-content

[3] Google Search Central. (n.d.). Google Search Essentials. Retrieved from https://developers.google.com/search/docs/essentials

[4] Google Search Central. (n.d.). Spam policies for Google web search. Retrieved from https://developers.google.com/search/docs/essentials/spam-policies

[5] Google Search Central. (n.d.). Link best practices for Google. Retrieved from https://developers.google.com/search/docs/crawling-indexing/links-crawlable

[6] Google Search Central. (n.d.). Mobile site and mobile-first indexing best practices. Retrieved from https://developers.google.com/search/docs/crawling-indexing/mobile/mobile-sites-mobile-first-indexing

[7] Google Search Central. (n.d.). Optimizing your website for generative AI features on Google Search. Retrieved from https://developers.google.com/search/docs/fundamentals/ai-optimization-guide

[8] Google Search Central. (n.d.). Introduction to structured data markup in Google Search. Retrieved from https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data