Seo May 16, 2026 5 min read

What HTTP Headers Does Google Actually Look For?

Find out exactly what HTTP headers Google looks for when crawling and indexing your site, and how to audit them with practical examples.

Googlebot reads more than your HTML. Before it ever parses a <title> tag or follows a link, it consumes a stack of HTTP response headers that quietly decide whether your page gets crawled, indexed, cached, or ignored. Misconfigure one and a perfectly good page can vanish from search results without a single visible error.

Here's exactly which HTTP headers Google pays attention to, what values it expects, and how to verify yours are sending the right signals.

The Headers Googlebot Actually Reads

Google has confirmed support for a specific set of response headers across its developer documentation and Search Central blog. These are the ones that materially affect crawling and indexing:

X-Robots-Tag — indexing directives at the HTTP level
Link (rel="canonical") — canonical URL signal
Content-Type — MIME type and character encoding
Content-Encoding — gzip, br, deflate compression
Content-Length — payload size hints
Cache-Control / Expires / ETag / Last-Modified — caching and conditional requests
Vary — content negotiation signals
Retry-After — backoff behaviour on 429/503
Location — redirect destination on 3xx responses
WWW-Authenticate — authentication challenges (and the trap that comes with them)

X-Robots-Tag: The Header That Controls Indexing

If you only audit one header, make it this one. X-Robots-Tag applies the same directives as the <meta name="robots"> tag, but at the HTTP level — which means it works for PDFs, images, and other non-HTML resources.

Common directives Google supports

noindex — exclude from the index
nofollow — don't follow links on the page
none — equivalent to noindex, nofollow
noarchive — no cached copy
nosnippet — no description snippet
max-snippet:[number], max-image-preview:[setting], max-video-preview:[number]
unavailable_after:[date] — drop after a date

Example for a downloadable PDF you don't want indexed

HTTP/1.1 200 OK
Content-Type: application/pdf
X-Robots-Tag: noindex, nofollow

You can also target specific crawlers: X-Robots-Tag: googlebot: noindex. The biggest mistake I see is leaving a noindex header on a staging environment that gets promoted to production. Always check this header before launch.

Canonical via the Link Header

For non-HTML files like PDFs, you can't use <link rel="canonical"> in the document. Google instead reads the Link response header:

Link: <https://example.com/whitepaper.pdf>; rel="canonical"

This is the only reliable way to consolidate duplicate PDF URLs, syndicated documents, or assets served from a CDN under a different hostname.

Content-Type and Character Encoding

Google uses Content-Type to decide how to parse the response. A missing or wrong charset can cause garbled snippets in search results, especially for non-Latin scripts.

Send this for HTML pages:

Content-Type: text/html; charset=UTF-8

If the charset is missing in the header and not declared in the HTML, Google falls back to guessing — and guessing wrong on Cyrillic, Arabic, or East Asian content is common.

Caching Headers Googlebot Respects

Googlebot supports conditional requests, which saves your crawl budget on large sites. The relevant headers are:

Last-Modified — Googlebot may send If-Modified-Since on subsequent requests; respond with 304 Not Modified when the content hasn't changed.
ETag — paired with If-None-Match for the same purpose.
Cache-Control — values like no-cache, private, and max-age influence how Google's caching infrastructure handles the response.

For a site with hundreds of thousands of URLs, returning proper 304 responses can dramatically increase the rate at which Google rediscovers your fresh content. Run a quick check on a representative URL with the HTTP Header Checker to confirm your server is sending Last-Modified and a valid ETag.

Vary: The Header That Breaks Mobile-First Indexing

If you serve different HTML to mobile and desktop user agents from the same URL (dynamic serving), you must send:

Vary: User-Agent

Without it, Google may cache the desktop version and serve it to mobile users — or worse, fail to detect the mobile variant during mobile-first indexing. Vary: Accept-Encoding is also important when serving compressed responses.

Status Codes and Their Headers

The status code itself is a signal, but the accompanying headers determine behaviour:

301 / 308 — permanent redirects. Must include a valid Location header. Google consolidates signals to the target.
302 / 307 — temporary redirects. Google eventually treats long-lived 302s as 301s, but the signal is weaker.
404 / 410 — not found / gone. 410 is processed faster for removal.
429 / 503 — rate limited / unavailable. Send Retry-After so Googlebot backs off without dropping URLs.
401 / 403 — authentication required. Pages behind WWW-Authenticate will not be indexed.

A Quick Audit Workflow

Run this on any URL you care about ranking:

Fetch the response with curl -I -A "Googlebot" (the user agent matters — some servers cloak responses).
Confirm the status code is 200 for indexable pages.
Check for an X-Robots-Tag. If present, verify the directives are intentional.
Verify Content-Type includes charset=UTF-8.
Confirm Last-Modified or ETag is present and changes when content updates.
If you use dynamic serving, confirm Vary: User-Agent.
For canonicalised PDFs or assets, verify the Link header.

The fastest way to do this without juggling curl flags is the AXOX Hub HTTP Header Checker — paste a URL, choose the Googlebot user agent, and review every header Google sees in one view.

Common Header Mistakes That Hurt Rankings

Leaving X-Robots-Tag: noindex on production after a migration
Returning 200 OK on soft-404 pages instead of 404 or 410
Missing Vary: User-Agent on dynamically served sites
Sending Cache-Control: no-store on pages you want crawled efficiently
Redirect chains where intermediate hops strip the canonical Link header
Serving HTML with Content-Type: text/plain from misconfigured CDNs

Audit a key URL now and confirm Googlebot is seeing exactly what you intend: axoxhub.com/tools/http-header-checker.

Try the free tool

Open Tool

← Back to Blog