robots.txt

When you make a copy of your site for development/testing purpose, you don’t need search engines index your site as it will cuase duplicate content.

To disable indexing of a website by search engines, create a file with name robots.txt with following content.

User-agent: *
Disallow: /

To Allow all robots, to use

User-agent: *
Allow: /

To specify sitemap

User-agent: *
Allow: /
Sitemap: https://serverok.in/sitemap.xml

Crawl-delay

User-agent: *
Allow: /
Sitemap: https://serverok.in/sitemap.xml
Crawl-delay: 10

Crawl-delay sets the number of seconds between each page request. In this case, the bot waits 10 seconds before indexing the next page. Bing and Yahoo support it.

Only allow search engines to index the Home page. Deny indexing all other pages

user-agent: *
Allow: /$
Disallow: /

For Nginx web server, to block search engines from indexing your page, add following to nginx.conf, this is useful for development website. Don’t forget to remove it when you make site live

add_header X-Robots-Tag "noindex, nofollow";

Back to SEO

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *