Seo May 16, 2026 5 min read

What HTTP Headers Does Google Actually Look For?

Find out exactly what HTTP headers Google looks for when crawling and indexing your site, and how to audit them with practical examples.

Googlebot reads more than your HTML. Before it ever parses a <title> tag or follows a link, it consumes a stack of HTTP response headers that quietly decide whether your page gets crawled, indexed, cached, or ignored. Misconfigure one and a perfectly good page can vanish from search results without a single visible error.

Here's exactly which HTTP headers Google pays attention to, what values it expects, and how to verify yours are sending the right signals.

The Headers Googlebot Actually Reads

Google has confirmed support for a specific set of response headers across its developer documentation and Search Central blog. These are the ones that materially affect crawling and indexing:

  • X-Robots-Tag — indexing directives at the HTTP level
  • Link (rel="canonical") — canonical URL signal
  • Content-Type — MIME type and character encoding
  • Content-Encoding — gzip, br, deflate compression
  • Content-Length — payload size hints
  • Cache-Control / Expires / ETag / Last-Modified — caching and conditional requests
  • Vary — content negotiation signals
  • Retry-After — backoff behaviour on 429/503
  • Location — redirect destination on 3xx responses
  • WWW-Authenticate — authentication challenges (and the trap that comes with them)

X-Robots-Tag: The Header That Controls Indexing

If you only audit one header, make it this one. X-Robots-Tag applies the same directives as the <meta name="robots"> tag, but at the HTTP level — which means it works for PDFs, images, and other non-HTML resources.

Common directives Google supports

  • noindex — exclude from the index
  • nofollow — don't follow links on the page
  • none — equivalent to noindex, nofollow
  • noarchive — no cached copy
  • nosnippet — no description snippet
  • max-snippet:[number], max-image-preview:[setting], max-video-preview:[number]
  • unavailable_after:[date] — drop after a date

Example for a downloadable PDF you don't want indexed

HTTP/1.1 200 OK
Content-Type: application/pdf
X-Robots-Tag: noindex, nofollow

You can also target specific crawlers: X-Robots-Tag: googlebot: noindex. The biggest mistake I see is leaving a noindex header on a staging environment that gets promoted to production. Always check this header before launch.

Canonical via the Link Header

For non-HTML files like PDFs, you can't use <link rel="canonical"> in the document. Google instead reads the Link response header:

Link: <https://example.com/whitepaper.pdf>; rel="canonical"

This is the only reliable way to consolidate duplicate PDF URLs, syndicated documents, or assets served from a CDN under a different hostname.

Content-Type and Character Encoding

Google uses Content-Type to decide how to parse the response. A missing or wrong charset can cause garbled snippets in search results, especially for non-Latin scripts.

Send this for HTML pages:

Content-Type: text/html; charset=UTF-8

If the charset is missing in the header and not declared in the HTML, Google falls back to guessing — and guessing wrong on Cyrillic, Arabic, or East Asian content is common.

Caching Headers Googlebot Respects

Googlebot supports conditional requests, which saves your crawl budget on large sites. The relevant headers are:

  • Last-Modified — Googlebot may send If-Modified-Since on subsequent requests; respond with 304 Not Modified when the content hasn't changed.
  • ETag — paired with If-None-Match for the same purpose.
  • Cache-Control — values like no-cache, private, and max-age influence how Google's caching infrastructure handles the response.

For a site with hundreds of thousands of URLs, returning proper 304 responses can dramatically increase the rate at which Google rediscovers your fresh content. Run a quick check on a representative URL with the HTTP Header Checker to confirm your server is sending Last-Modified and a valid ETag.

Vary: The Header That Breaks Mobile-First Indexing

If you serve different HTML to mobile and desktop user agents from the same URL (dynamic serving), you must send:

Vary: User-Agent

Without it, Google may cache the desktop version and serve it to mobile users — or worse, fail to detect the mobile variant during mobile-first indexing. Vary: Accept-Encoding is also important when serving compressed responses.

Status Codes and Their Headers

The status code itself is a signal, but the accompanying headers determine behaviour:

  1. 301 / 308 — permanent redirects. Must include a valid Location header. Google consolidates signals to the target.
  2. 302 / 307 — temporary redirects. Google eventually treats long-lived 302s as 301s, but the signal is weaker.
  3. 404 / 410 — not found / gone. 410 is processed faster for removal.
  4. 429 / 503 — rate limited / unavailable. Send Retry-After so Googlebot backs off without dropping URLs.
  5. 401 / 403 — authentication required. Pages behind WWW-Authenticate will not be indexed.

A Quick Audit Workflow

Run this on any URL you care about ranking:

  1. Fetch the response with curl -I -A "Googlebot" (the user agent matters — some servers cloak responses).
  2. Confirm the status code is 200 for indexable pages.
  3. Check for an X-Robots-Tag. If present, verify the directives are intentional.
  4. Verify Content-Type includes charset=UTF-8.
  5. Confirm Last-Modified or ETag is present and changes when content updates.
  6. If you use dynamic serving, confirm Vary: User-Agent.
  7. For canonicalised PDFs or assets, verify the Link header.

The fastest way to do this without juggling curl flags is the AXOX Hub HTTP Header Checker — paste a URL, choose the Googlebot user agent, and review every header Google sees in one view.

Common Header Mistakes That Hurt Rankings

  • Leaving X-Robots-Tag: noindex on production after a migration
  • Returning 200 OK on soft-404 pages instead of 404 or 410
  • Missing Vary: User-Agent on dynamically served sites
  • Sending Cache-Control: no-store on pages you want crawled efficiently
  • Redirect chains where intermediate hops strip the canonical Link header
  • Serving HTML with Content-Type: text/plain from misconfigured CDNs

Audit a key URL now and confirm Googlebot is seeing exactly what you intend: axoxhub.com/tools/http-header-checker.

Try the free tool

Open Tool