Stop Search Engines from Crawling your Website

Search Engine crawling can be disabled by robots.txt file changes.

Search engine User-agents

Common search engine User-agents are

Googlebot
Yahoo! Slurp
bingbot

Common search engine User-agents blocked:

AhrefsBot
Baiduspider
Ezooms
MJ12bot
YandexBot

Set a crawl delay for all search engines:

If you had 1,000 pages on your website, a search engine could potentially index your entire site in a few minutes.

However, this could cause high system resource usage with all of those pages loaded in a short time period.

Crawl-delay: of 30 seconds would allow crawlers to index your entire 1,000 page website in just 8.3 hours

Crawl-delay: of 500 seconds would allow crawlers to index your entire 1,000 page website in 5.8 days

You can set the Crawl-delay: for all search engines at once with:

User-agent: *
Crawl-delay: 30

Allow all search engines to crawl website:

By default search engines should be able to crawl your website, but you can also specify they are allowed with:

User-agent: *
Disallow:

Disallow all search engines from crawling website:

You can disallow any search engine from crawling your website, with these rules:

User-agent: *
Disallow: /

Disallow one particular search engines from crawling website:

You can disallow just one specific search engine from crawling your website, with these rules:

User-agent: Baiduspider
Disallow: /

Disallow all search engines from particular folders:

If we had a few directories like /cgi-bin//private/, and /tmp/ we didn’t want bots to crawl we could use this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /private/
Disallow: /tmp/

Disallow all search engines from particular files:

If we had a file like contactus.htmindex.htm, and store.htm we didn’t want bots to crawl we could use this:

User-agent: *
Disallow: /contactus.htm
Disallow: /index.htm
Disallow: /store.htm

Disallow all search engines but one:

If we only wanted to allow Googlebot access to our /private/ directory and disallow all other bots we could use:

User-agent: *
Disallow: /private/
User-agent: Googlebot
Disallow:

When the Googlebot reads our robots.txt file, it will see it is not disallowed from crawling any directories.

Reference : https://www.inmotionhosting.com/support/website/restricting-bots/how-to-stop-search-engines-from-crawling-your-website

Was this article helpful?

Leave A Comment?

You must be logged in to post a comment.