Robots.txt Analyzer

Fetch and analyze any website's robots.txt. See allowed/blocked paths and sitemaps.

What is robots.txt?

The robots.txt file tells search engine crawlers which pages or files they can or can't request from your site. It's located at the root of your domain (e.g. example.com/robots.txt) and follows a standard format with User-Agent, Allow, and Disallow directives.

Why analyze your robots.txt?

A misconfigured robots.txt can accidentally block search engines from indexing important pages — or expose private areas you meant to hide. Regular analysis helps you catch issues before they hurt your rankings.

Common robots.txt mistakes

  • Blocking CSS/JS files — prevents Google from rendering your pages correctly
  • Missing Sitemap directive — the Sitemap line helps crawlers discover your full site structure
  • Using Disallow: / without intent — this blocks your entire site from being indexed
  • Forgetting trailing slashes/admin blocks the page, /admin/ blocks the directory
  • No robots.txt at all — crawlers will index everything, including staging or admin pages

Tips

  • Always include a Sitemap: line pointing to your XML sitemap
  • Test changes with Google Search Console's robots.txt tester before deploying
  • Use specific paths instead of broad blocks — be surgical, not sweeping
  • Remember: robots.txt is publicly visible. Never rely on it to hide sensitive content

Frequently Asked Questions

What is a robots.txt file?

A robots.txt file sits at the root of a website (e.g. example.com/robots.txt) and tells search engine crawlers which pages or sections they are allowed or not allowed to visit. It follows the Robots Exclusion Standard.

Can robots.txt stop Google from indexing my pages?

Robots.txt prevents crawling, but it does not prevent indexing. Google may still index a blocked URL if other sites link to it — it just won't be able to read the content. To prevent indexing, use a noindex meta tag or X-Robots-Tag header instead.

Where must robots.txt be located?

Robots.txt must be at the root of the domain: example.com/robots.txt. It cannot be placed in a subdirectory. Each subdomain needs its own separate robots.txt file at its own root.

What is the difference between Allow and Disallow?

Disallow tells crawlers to skip a path (e.g. Disallow: /admin/). Allow overrides a Disallow to permit access to a specific path within a blocked directory — useful when you block a folder but need one subfolder to remain crawlable.