Robots.txt Checker: How to Test and Validate Your Robots.txt File

A single mistake in your robots.txt file can accidentally block Google from crawling your entire website. It happens more often than you think β and the consequences can devastate your search rankings overnight. This guide explains what robots.txt does, how to check it correctly, and how to fix the most common errors.
Check Your Robots.txt Now
π Robots.txt Checker β Free Online Tool
Enter any domain and instantly see its robots.txt file, validate the syntax, and check which pages are blocked from crawlers.
What Is Robots.txt?
Robots.txt is a plain text file placed in the root directory of your website (yourdomain.com/robots.txt) that tells search engine crawlers which pages or sections they should or shouldn't access.
It's part of the Robots Exclusion Protocol β a standard that major search engines like Google, Bing, and others follow by convention (not by obligation).
What robots.txt controls:
- Which pages crawlers can access
- Which crawlers are affected (Google, Bing, specific bots)
- Where your XML sitemap is located
- Crawl delay between requests
How Robots.txt Works
When a search engine bot visits your site, it first checks yourdomain.com/robots.txt before crawling any page. Based on the rules it finds, it decides what to crawl.
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
Sitemap: https://yourdomain.com/sitemap.xml
Important distinction: Robots.txt controls crawling, not indexing. A page blocked by robots.txt won't be crawled β but it can still appear in search results if other pages link to it. To prevent indexing, use the noindex meta tag instead.
Robots.txt Syntax Explained
Basic structure
User-agent: [bot name or *]
Disallow: [path to block]
Allow: [path to allow]
Crawl-delay: [seconds]
Sitemap: [full URL to sitemap]
User-agent directives
| User-agent | Crawler |
|---|---|
* |
All crawlers |
Googlebot |
Google (all) |
Googlebot-Image |
Google Images |
Googlebot-Video |
Google Video |
Bingbot |
Microsoft Bing |
Slurp |
Yahoo Search |
DuckDuckBot |
DuckDuckGo |
facebookexternalhit |
Facebook link previews |
Twitterbot |
Twitter/X link previews |
Allow and Disallow rules
# Block all crawlers from the entire site
User-agent: *
Disallow: /
# Allow all crawlers everywhere (default behavior)
User-agent: *
Disallow:
# Block a specific directory
User-agent: *
Disallow: /admin/
# Block a specific file
User-agent: *
Disallow: /private-page.html
# Block all PDFs
User-agent: *
Disallow: /*.pdf$
# Allow Google but block everything else
User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /
Wildcard patterns
| Pattern | Matches |
|---|---|
/admin/ |
Exactly /admin/ and everything inside |
/admin* |
Anything starting with /admin |
*.pdf$ |
All URLs ending in .pdf |
/*? |
All URLs with query parameters |
How to Check Your Robots.txt File
Method 1: Online checker (fastest)
Use Robots.txt Checker NetVizor:
- Enter your domain
- See the current robots.txt content
- Check which paths are blocked or allowed
- Validate syntax errors
Method 2: Direct URL
Simply open yourdomain.com/robots.txt in your browser. If it returns a 404, you don't have a robots.txt file (which is fine β all pages are crawlable by default).
Method 3: Google Search Console
- Open Google Search Console
- Go to Settings β robots.txt
- Google shows the robots.txt it last fetched and when
This is especially useful to check if Googlebot sees the same robots.txt as you do β caching issues can cause discrepancies.
Method 4: Google's Robots.txt Tester
- Open Google Search Console β Settings β robots.txt Tester (legacy tool)
- Test specific URLs against your current robots.txt
- See whether a URL is allowed or blocked
Most Common Robots.txt Mistakes
Mistake 1: Accidentally blocking the entire site
The most catastrophic mistake:
# WRONG β blocks all crawlers from everything
User-agent: *
Disallow: /
This single rule prevents Google from crawling any page on your website. Rankings disappear within days.
How it happens: Developers add this during site maintenance and forget to remove it. Always check robots.txt after a site launch or migration.
Mistake 2: Blocking CSS and JavaScript files
# WRONG β prevents Google from rendering your pages
User-agent: *
Disallow: /wp-content/
Disallow: /assets/
If Google can't access your CSS and JavaScript, it can't properly render your pages. This hurts rankings because Google sees a broken version of your site.
Fix: Allow Googlebot to access all resources needed to render pages.
Mistake 3: Disallow without trailing slash
# Blocks only /admin (the exact URL)
Disallow: /admin
# Blocks /admin/ and everything inside it
Disallow: /admin/
Without the trailing slash, you only block the exact URL β not the directory and its contents.
Mistake 4: Wrong file location or filename
Robots.txt must be:
- In the root directory (
yourdomain.com/robots.txt) - Named exactly
robots.txt(lowercase) - Served with 200 status (not 301 redirect)
- Plain text format (
text/plain)
A robots.txt at yourdomain.com/folder/robots.txt has no effect.
Mistake 5: Blocking important pages by accident
# Meant to block /private/secret
# Actually blocks ALL pages starting with /p
Disallow: /p
Always test your rules with Robots.txt Checker NetVizor before publishing.
Mistake 6: Using robots.txt to hide sensitive content
Robots.txt is publicly visible β anyone can read it. If you list sensitive directories in robots.txt, you're actually advertising their existence to bad actors.
Use server-side authentication to protect sensitive content β not robots.txt.
Robots.txt for Common CMS Platforms
WordPress
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-login.php
Disallow: /xmlrpc.php
Sitemap: https://yourdomain.com/sitemap_index.xml
Shopify
Shopify generates robots.txt automatically. You can customise it via the robots.txt.liquid template. Common additions:
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /account
Next.js / Nuxt.js
In Next.js, create public/robots.txt or use the next-sitemap package. In Nuxt 3, use the nuxt-simple-robots module or place it in the public/ directory.
Robots.txt vs Meta Noindex: What's the Difference?
These two mechanisms are often confused:
| Robots.txt | Meta Noindex | |
|---|---|---|
| Controls | Crawling | Indexing |
| Location | Root directory file | HTML <head> tag |
| Effect | Bot won't visit the page | Bot visits but won't index |
| Scope | Entire directories or patterns | Individual pages |
| Can still rank? | Yes (via links) | No |
When to use robots.txt:
- Block crawlers from admin areas, internal tools
- Prevent crawling of duplicate content
- Save crawl budget on large sites
When to use noindex:
- Remove specific pages from search results
- Thank-you pages, login pages, internal search results
Crawl Budget: Why Robots.txt Matters for Large Sites
For large websites (100,000+ pages), crawl budget becomes critical. Google doesn't crawl every page of every site on every visit β it allocates a certain number of crawl requests per site.
Wasting crawl budget on unimportant pages (faceted navigation, filtered URLs, duplicate content) means important pages get crawled less frequently.
Robots.txt helps by blocking low-value URLs:
# Block faceted navigation (common e-commerce issue)
User-agent: *
Disallow: /*?color=
Disallow: /*?sort=
Disallow: /*?page=
# Block internal search results
Disallow: /search/
XML Sitemap in Robots.txt
Always include your sitemap URL in robots.txt β it helps search engines find and crawl your content:
User-agent: *
Disallow: /admin/
Sitemap: https://yourdomain.com/sitemap.xml
If you have multiple sitemaps:
Sitemap: https://yourdomain.com/sitemap-pages.xml
Sitemap: https://yourdomain.com/sitemap-posts.xml
Sitemap: https://yourdomain.com/sitemap-images.xml
Check if your sitemap is valid with DNS Lookup NetVizor to verify the domain resolves correctly, and make sure all sitemap URLs return 200 status.
FAQ: Robots.txt Questions
Does robots.txt affect Google rankings? Indirectly, yes. Blocking important pages prevents Google from crawling and indexing them β which removes them from search results. Blocking CSS/JS hurts rendering quality. A clean, well-configured robots.txt helps Google crawl your site efficiently.
What happens if I don't have a robots.txt file? Nothing bad β all pages are crawlable by default. A missing robots.txt simply means no restrictions. Google won't penalise you for not having one.
Can I block specific countries or IPs with robots.txt? No. Robots.txt only controls crawlers β not human visitors, and not by location. Use server-side rules (Cloudflare, .htaccess, nginx config) to block IPs or countries.
Does every website need a robots.txt? Not necessarily. Small sites with no sensitive areas and no duplicate content issues don't need one. Larger sites, e-commerce platforms, and sites with admin areas should have one.
How quickly does Google update after I change robots.txt? Google typically re-fetches robots.txt within 24 hours. However, the effects on crawling can take days to propagate β previously blocked pages may take weeks to disappear from search results (or reappear after unblocking).
Can I use robots.txt to block AI crawlers? Yes. Specify the user-agent for AI crawlers:
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
Conclusion
Robots.txt is simple in concept but powerful in impact. A single misplaced rule can block your entire site from Google β and a well-crafted file can significantly improve how efficiently crawlers navigate your content.
Quick checklist:
- Robots.txt is at
yourdomain.com/robots.txt - No accidental
Disallow: /for all crawlers - CSS and JavaScript files are accessible to Googlebot
- Sitemap URL is included
- Sensitive directories use trailing slashes
- Tested with Robots.txt Checker NetVizor
Related Articles
GuideRainbow Six Siege Ping Test: How to Check and Fix High Ping
Test your ping to Rainbow Six Siege servers instantly. Learn what causes high ping in R6 Siege, how to check latency in-game, and the best fixes to reduce lag.
GuidePing tester β check your ping online
Test your ping to any server instantly. Check latency in ms, understand jitter and packet loss, and find out how to reduce high ping.
GuideCall of Duty Ping Test β Check Your Latency to Warzone and CoD Servers
Test your ping to Warzone and CoD servers instantly. Fix lag, check NAT type, diagnose packet loss and find out why you're dying around corners