What HTTP Headers Does Google Actually Look For?
Find out exactly what HTTP headers Google looks for when crawling and indexing your site, and how to audit them with practical examples.
Googlebot reads more than your HTML. Before it ever parses a <title> tag or follows a link, it consumes a stack of HTTP response headers that quietly decide whether your page gets crawled, indexed, cached, or ignored. Misconfigure one and a perfectly good page can vanish from search results without a single visible error.
Here's exactly which HTTP headers Google pays attention to, what values it expects, and how to verify yours are sending the right signals.
The Headers Googlebot Actually Reads
Google has confirmed support for a specific set of response headers across its developer documentation and Search Central blog. These are the ones that materially affect crawling and indexing:
- X-Robots-Tag — indexing directives at the HTTP level
- Link (rel="canonical") — canonical URL signal
- Content-Type — MIME type and character encoding
- Content-Encoding — gzip, br, deflate compression
- Content-Length — payload size hints
- Cache-Control / Expires / ETag / Last-Modified — caching and conditional requests
- Vary — content negotiation signals
- Retry-After — backoff behaviour on 429/503
- Location — redirect destination on 3xx responses
- WWW-Authenticate — authentication challenges (and the trap that comes with them)
X-Robots-Tag: The Header That Controls Indexing
If you only audit one header, make it this one. X-Robots-Tag applies the same directives as the <meta name="robots"> tag, but at the HTTP level — which means it works for PDFs, images, and other non-HTML resources.
Common directives Google supports
noindex— exclude from the indexnofollow— don't follow links on the pagenone— equivalent tonoindex, nofollownoarchive— no cached copynosnippet— no description snippetmax-snippet:[number],max-image-preview:[setting],max-video-preview:[number]unavailable_after:[date]— drop after a date
Example for a downloadable PDF you don't want indexed
HTTP/1.1 200 OK Content-Type: application/pdf X-Robots-Tag: noindex, nofollow
You can also target specific crawlers: X-Robots-Tag: googlebot: noindex. The biggest mistake I see is leaving a noindex header on a staging environment that gets promoted to production. Always check this header before launch.
Canonical via the Link Header
For non-HTML files like PDFs, you can't use <link rel="canonical"> in the document. Google instead reads the Link response header:
Link: <https://example.com/whitepaper.pdf>; rel="canonical"
This is the only reliable way to consolidate duplicate PDF URLs, syndicated documents, or assets served from a CDN under a different hostname.
Content-Type and Character Encoding
Google uses Content-Type to decide how to parse the response. A missing or wrong charset can cause garbled snippets in search results, especially for non-Latin scripts.
Send this for HTML pages:
Content-Type: text/html; charset=UTF-8
If the charset is missing in the header and not declared in the HTML, Google falls back to guessing — and guessing wrong on Cyrillic, Arabic, or East Asian content is common.
Caching Headers Googlebot Respects
Googlebot supports conditional requests, which saves your crawl budget on large sites. The relevant headers are:
- Last-Modified — Googlebot may send
If-Modified-Sinceon subsequent requests; respond with304 Not Modifiedwhen the content hasn't changed. - ETag — paired with
If-None-Matchfor the same purpose. - Cache-Control — values like
no-cache,private, andmax-ageinfluence how Google's caching infrastructure handles the response.
For a site with hundreds of thousands of URLs, returning proper 304 responses can dramatically increase the rate at which Google rediscovers your fresh content. Run a quick check on a representative URL with the HTTP Header Checker to confirm your server is sending Last-Modified and a valid ETag.
Vary: The Header That Breaks Mobile-First Indexing
If you serve different HTML to mobile and desktop user agents from the same URL (dynamic serving), you must send:
Vary: User-Agent
Without it, Google may cache the desktop version and serve it to mobile users — or worse, fail to detect the mobile variant during mobile-first indexing. Vary: Accept-Encoding is also important when serving compressed responses.
Status Codes and Their Headers
The status code itself is a signal, but the accompanying headers determine behaviour:
- 301 / 308 — permanent redirects. Must include a valid
Locationheader. Google consolidates signals to the target. - 302 / 307 — temporary redirects. Google eventually treats long-lived 302s as 301s, but the signal is weaker.
- 404 / 410 — not found / gone. 410 is processed faster for removal.
- 429 / 503 — rate limited / unavailable. Send
Retry-Afterso Googlebot backs off without dropping URLs. - 401 / 403 — authentication required. Pages behind
WWW-Authenticatewill not be indexed.
A Quick Audit Workflow
Run this on any URL you care about ranking:
- Fetch the response with
curl -I -A "Googlebot"(the user agent matters — some servers cloak responses). - Confirm the status code is
200for indexable pages. - Check for an
X-Robots-Tag. If present, verify the directives are intentional. - Verify
Content-Typeincludescharset=UTF-8. - Confirm
Last-ModifiedorETagis present and changes when content updates. - If you use dynamic serving, confirm
Vary: User-Agent. - For canonicalised PDFs or assets, verify the
Linkheader.
The fastest way to do this without juggling curl flags is the AXOX Hub HTTP Header Checker — paste a URL, choose the Googlebot user agent, and review every header Google sees in one view.
Common Header Mistakes That Hurt Rankings
- Leaving
X-Robots-Tag: noindexon production after a migration - Returning
200 OKon soft-404 pages instead of404or410 - Missing
Vary: User-Agenton dynamically served sites - Sending
Cache-Control: no-storeon pages you want crawled efficiently - Redirect chains where intermediate hops strip the canonical
Linkheader - Serving HTML with
Content-Type: text/plainfrom misconfigured CDNs
Audit a key URL now and confirm Googlebot is seeing exactly what you intend: axoxhub.com/tools/http-header-checker.
Try the free tool
Open Tool