All Posts
Guide

Robots.txt Checker: How to Test & Validate Your Robots.txt

NetVizor Team April 11, 2026
Robots.txt Checker: How to Test & Validate Your Robots.txt
#robots.txt #SEO #search engines

Your website might be invisible to Google right now — and one tiny file is the reason.

Somewhere on your server, there's a plain text file called robots.txt. It's barely a few lines long. Most website owners never look at it. And yet, this little file has the power to completely hide your site from every search engine on the planet.

Think of robots.txt as the bouncer at a nightclub. It stands at the front door of your website and tells search engine crawlers — Googlebot, Bingbot, and the rest — which rooms they're allowed to enter and which ones are off-limits. Get it right, and search engines index exactly what you want. Get it wrong, and your carefully crafted pages might as well not exist.

The scary part? A single typo can block your entire site from being crawled. And you wouldn't even know it unless you check.

Ready to see if your robots.txt is helping or hurting your SEO? Run a free check right now with the NetVizor Robots.txt Checker — it takes seconds and could save you months of lost traffic.


How robots.txt Works

Every robots.txt file lives in the same place: the root of your domain. If your site is example.com, then your robots.txt is at example.com/robots.txt. Always. No exceptions.

The format is dead simple. Here's what a healthy robots.txt looks like:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /admin/public-api/
Sitemap: https://example.com/sitemap.xml

Let's break down each directive:

User-agent — Which crawler this rule applies to. The asterisk * means "all bots." You can also target specific ones like Googlebot or Bingbot.

Disallow — Tells crawlers "don't go here." In the example above, we're keeping bots out of /admin/ and /private/. Simple enough.

Allow — Overrides a Disallow for a specific path. We blocked /admin/, but we still want bots to reach /admin/public-api/. Allow makes that exception.

Sitemap — Points crawlers to your XML sitemap. This is one of the most underrated lines in any robots.txt. It's like handing the bouncer a guest list — here's exactly who should be inside.

Crawl-delay — Asks bots to wait a certain number of seconds between requests. Googlebot ignores this (you set crawl rate in Google Search Console instead), but Bingbot and others respect it. Useful if your server struggles under heavy crawling.

Now here's the terrifying version:

User-agent: *
Disallow: /

This single line makes your entire website invisible to every search engine. The forward slash after Disallow: means "everything." Every page, every image, every PDF — all blocked. It's the most common catastrophic robots.txt mistake, and it happens more often than you'd think — especially after migrations or staging-to-production deployments where someone forgets to update the file.


7 Common robots.txt Mistakes That Kill Your SEO

1. Blocking CSS and JavaScript Files

User-agent: *
Disallow: /css/
Disallow: /js/
Disallow: /assets/

This was standard advice back in 2010. Today, it's an SEO disaster. Google renders your pages to understand them — it needs your CSS and JavaScript to see what users see. Block those files, and Google sees a broken, unstyled mess. Your rankings will tank.

Fix: Remove these Disallow rules entirely. Let crawlers access your static assets.

2. Blocking Entire Directories with Important Content

User-agent: *
Disallow: /blog/

Maybe you meant to block /blog/drafts/, but you just blocked your entire blog. Every post, every category page — gone from search results.

Fix: Be specific. Block only what you need to block:

User-agent: *
Disallow: /blog/drafts/
Disallow: /blog/preview/

3. Missing Sitemap Reference

User-agent: *
Disallow: /admin/

Technically valid. But you're leaving free SEO value on the table. Without a Sitemap directive, crawlers have to discover your pages on their own through links. Why make their job harder?

Fix: Always include your sitemap:

User-agent: *
Disallow: /admin/
Sitemap: https://yoursite.com/sitemap.xml

4. Using Disallow: / (Blocking Everything)

We covered this one already, but it bears repeating because it's that common. Staging environments almost always have this rule. When you push staging config to production without checking — boom, invisible website.

Fix: Check your robots.txt immediately after every deployment. Better yet, use the NetVizor Robots.txt Checker as part of your launch checklist.

5. Syntax Errors

User-agent: *
Dissallow: /admin/
disallow: /private

Misspelling Disallow as Dissallow? Crawlers won't understand it — that line is silently ignored. And while directives are technically case-insensitive, inconsistent casing can be a sign you're manually editing without validating.

Fix: Always run your robots.txt through a validator after editing.

6. Forgetting That Trailing Slashes Matter

Disallow: /private

This blocks /private, /private/, /private-photos/, /privately-shared/ — anything that starts with /private. That's probably not what you wanted.

Disallow: /private/

This blocks only the /private/ directory and its contents. The trailing slash makes it directory-specific.

Fix: Use trailing slashes when you mean to block directories. Be intentional about path matching.

7. No robots.txt at All

If your site returns a 404 for /robots.txt, search engines assume everything is fair game. They'll crawl your entire site — which sounds great until you realize they're also crawling your admin panels, search result pages, duplicate content, and other junk that dilutes your SEO.

Plus, you miss the chance to point crawlers to your sitemap.

Fix: Create a robots.txt file. Even a minimal one is better than nothing:

User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml

How to Check Your robots.txt (Step by Step)

Method 1: Use the NetVizor Robots.txt Checker (Recommended)

This is the fastest and most thorough approach:

  1. Go to NetVizor Robots.txt Checker
  2. Enter your domain name
  3. Hit "Check"
  4. Review the parsed results — the tool fetches your robots.txt, parses every directive, highlights syntax issues, and shows you exactly what's being blocked and allowed

The tool doesn't just show you the raw file — it interprets it. You'll see which user-agents have rules, which paths are blocked, whether a sitemap is declared, and if there are any issues worth fixing.

Method 2: Check Manually in Your Browser

Open a new tab and type yoursite.com/robots.txt. You'll see the raw file. This tells you what's there, but it won't catch syntax errors or logic mistakes. You're on your own for interpretation.

Method 3: Google Search Console

Google Search Console has a robots.txt tester built in. It lets you test specific URLs against your rules to see if Googlebot can access them. It's useful, but it only tests from Google's perspective — not Bing, not other crawlers.

Why the Online Checker Wins

Manual checking shows you the file. The online checker actually parses it — like a compiler for your robots.txt. It catches things your eyes miss: subtle syntax errors, conflicting rules, missing sitemaps, overly broad blocks. Think of it as the difference between reading code and running code.


How to Write a Perfect robots.txt

Template for a Typical Website

User-agent: *
Disallow: /admin/
Disallow: /cgi-bin/
Disallow: /search/
Disallow: /thank-you/
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Clean and simple. Block admin areas and internal pages, allow everything else, declare your sitemap.

Template for an E-Commerce Site

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /wishlist/
Disallow: /search?*
Disallow: /compare/
Disallow: /*?sort=
Disallow: /*?filter=
Allow: /

Sitemap: https://shop.example.com/sitemap.xml

E-commerce sites have tons of duplicate pages from filters, sorting, and search. Block the dynamic parameter URLs while keeping product and category pages open.

Template for a Blog

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /tag/
Disallow: /author/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://blog.example.com/sitemap.xml

For WordPress blogs, block the admin area (but allow admin-ajax.php since some themes need it), and consider blocking tag and author archives if they create thin content.

After creating or updating your robots.txt using any of these templates, always validate it with the Robots.txt Checker before pushing to production.


robots.txt vs. Meta Robots vs. X-Robots-Tag

These three all control how search engines interact with your content, but they work at very different levels:

Feature robots.txt Meta Robots Tag X-Robots-Tag
Scope Entire site or directories Individual pages Individual URLs (any file type)
Location Root of domain (/robots.txt) HTML <head> section HTTP response header
Controls crawling? Yes — blocks crawlers from accessing URLs No — page must be crawled to read the tag No — file must be requested to read the header
Controls indexing? No — only controls access Yes — noindex prevents indexing Yes — noindex prevents indexing
Works on non-HTML? Yes No — only HTML pages Yes — PDFs, images, anything
Best for Blocking entire sections, managing crawl budget Page-level noindex/nofollow Non-HTML files, CDN-level control

Here's the key distinction most people miss: robots.txt controls crawling, not indexing. If a page is blocked by robots.txt but linked from elsewhere, Google might still index the URL (with a "No information is available for this page" snippet). To truly prevent indexing, you need noindex via meta robots or X-Robots-Tag.

Want to check if your server is sending X-Robots-Tag headers? Use the HTTP Headers Checker to inspect your response headers directly.


Testing Changes Before Going Live

Never edit robots.txt directly on production and hope for the best. That's how "Disallow: /" ends up on a live site.

Here's the workflow that keeps you safe:

  1. Edit locally — Make your changes in a text editor or your CMS
  2. Validate — Paste the content into the NetVizor Robots.txt Checker or test it against specific URLs
  3. Deploy — Push the updated file to production
  4. Verify — Check the live URL (yoursite.com/robots.txt) and run it through the checker one more time to confirm it's serving correctly

After updating robots.txt, it's a great time to run a broader site health check. A few tools that pair well with your robots.txt review:

  • SSL Checker — Make sure your HTTPS certificate is valid. Search engines favor secure sites, and a broken SSL can cause crawl errors regardless of what robots.txt says.
  • Speed Test — Slow pages get crawled less frequently. Even a perfect robots.txt won't help if Google gives up waiting for your pages to load.
  • Security Score — Check for security headers and vulnerabilities. A compromised site can end up with injected robots.txt rules you never wrote.
  • HTML Validator — Validate your HTML to ensure crawlers can properly parse your pages once they get past robots.txt.
  • DNS Lookup — Verify your domain's DNS records are correct. Misconfigured DNS can prevent crawlers from reaching your robots.txt entirely.
  • WHOIS Lookup — Check domain registration details and expiry dates. An expired domain means no crawling at all.

FAQ

Does robots.txt block pages from appearing in Google?

Not exactly. robots.txt blocks crawling, not indexing. If Google can't crawl a page, it won't see the content — but if other sites link to that URL, Google might still list it in search results with a bare URL and no snippet. To truly block indexing, use a noindex meta tag or X-Robots-Tag header. But here's the catch: the page has to be crawlable for Google to see the noindex directive. So don't block a page with robots.txt AND add noindex — that's contradictory.

How often does Google check robots.txt?

Google typically re-fetches robots.txt roughly every 24 hours, but it can vary. After making changes, you might not see the effect immediately. Google caches the file, and major changes (like unblocking previously blocked sections) can take a few days to fully propagate. You can request a re-crawl through Google Search Console to speed things up.

Can I block specific bots like ChatGPT or AI crawlers?

Yes. AI companies use specific user-agent strings for their crawlers. Here's how to block the most common ones:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

GPTBot and ChatGPT-User are OpenAI's crawlers. CCBot is used by Common Crawl (which feeds many AI models). Google-Extended controls whether Google uses your content for AI training (like Gemini) while still allowing regular search indexing. Whether these bots actually respect your robots.txt is another question — but major companies have publicly committed to honoring it.

Does robots.txt protect private content?

Absolutely not. robots.txt is a public file — anyone can read it at yoursite.com/robots.txt. In fact, listing sensitive directories in robots.txt is like putting up a sign that says "secret stuff this way." Malicious actors specifically check robots.txt to find interesting paths to probe.

For actual security, use authentication, access controls, and server-side restrictions. robots.txt is for SEO management, not security.

What happens if I delete robots.txt?

If your server returns a 404 for /robots.txt, search engines treat it as "no restrictions" — they'll crawl everything they can find. This isn't necessarily bad, but you lose the ability to guide crawlers, manage crawl budget, or point them to your sitemap. For most sites, having even a basic robots.txt with just a Sitemap directive is better than having none at all.


Take Control of Your Crawling

Your robots.txt file is one of the oldest and most fundamental pieces of technical SEO. It's also one of the easiest to get wrong — and one of the hardest to notice when it breaks.

Don't wait until you notice a traffic drop to find out something went wrong. Be proactive. Check your robots.txt regularly, especially after site updates, migrations, or CMS changes.

Check your robots.txt now with NetVizor — it's free, it's instant, and it might just save your rankings.