What is robots.txt?
Robots.txt is a simple text file which instructs web crawlers, how to crawl and index web pages into a search engine. So, you can assume the importance of robots.txt for every website. The best part is all major search engines support almost all commands. So, you don’t have to prepare it for every web crawlers. But the question is what is robots txt and how to create a robots txt for Google. So, I’ll give important information that you should know about robots.txt.
If you are in the SEO business then you might have heard about the robots.txt file. It indexes your web pages in search engines. Some of the web pages might not index then 1st you need to do is check your robots.txt by using Robots.txt Tester in your Google Search Console.
And if you see, some accidental blocking of web pages or URLs then you have removed those URLs from your robots.txt file. So, it is very important for every website for better indexing. But the question is, how to create robots.txt for Google and other major search engines? So, without further due, let’s dive into it.
How to Create a robots.txt for Google?
1st of all, you should check, if you have one or not? But if you are using WordPress for your website then you don’t need to worry about robots.txt because WordPress provide it by default.
robots.txt for WordPress site;
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
You can confirm it by simply add /robots.txt after your domain name. I.e. lokesharyan.com/robots.txt. If you see some syntaxes then congratulations! You have a robots.txt file of your website. But if you get 404 error page then you should follow this tutorial until the end.
User-agent: * Disallow: Disallow: /?s=* # Google Image Crawler Setup User-agent: Googlebot-image Disallow: #Website Sitemap Sitemap: https://www.lokesharyan.com/post-sitemap.xml Sitemap: https://www.lokesharyan.com/page-sitemap.xml Sitemap: https://www.lokesharyan.com/product-sitemap.xml Sitemap: https://www.lokesharyan.com/category-sitemap.xml Sitemap: https://www.lokesharyan.com/product_cat-sitemap.xml
If you don’t have robots.txt then you should create one. The process is very simple. you should follow basic robots.txt guidelines by Google. Read it carefully.
In any case, please don’t use;
User-agent: * Disallow: /
That means you are blocking all web crawler from crawling all your website content. 1st of all you should list out all top landing pages or priority pages. After that, list out all URLs that you want to block by web crawlers. If you want to specify the web crawler then please do otherwise use *
Now, paste all the URLs that you want to block after Disallow:
you can also block internal search pages by using the following syntax;
If you want your images to index by search engines then use this syntax;
# Google Image Crawler Setup User-agent: Googlebot-image Disallow:
# means comment and it doesn’t count by robots.txt file. This command allows Googlebot-images to crawl all images of your website.
I think you should specify the location of your sitemaps. For that, you can use the following syntax;
#Website Sitemap Sitemap: https://www.lokesharyan.com/post-sitemap.xml Sitemap: https://www.lokesharyan.com/page-sitemap.xml Sitemap: https://www.lokesharyan.com/product-sitemap.xml Sitemap: https://www.lokesharyan.com/category-sitemap.xml Sitemap: https://www.lokesharyan.com/product_cat-sitemap.xml
Please add the whole URL of your sitemap & if you have multiple sitemaps then add all your sitemaps. You should know the importance of multiple sitemaps. Make sure, if you add those URLs that you want to block by web crawlers in the sitemap then remove those URLs from sitemap first then you can block those by robots.txt file. Once you did save it and rename it by robots.txt
Where to put robots.txt?
Before put it on the server, 1st you should check it. You can check it by login to your Google Search Console and go to the robots.txt tester. Paste one-by-one all URLs and check it. If it works fine then you are ready to add it to the server.
Once you create a robots.txt file, now it’s time to save it on the server. For that, log in to your server and open root directory. Search for robots.txt, if you find one then open it and replace all the syntaxes. Otherwise, create a new file and paste all the syntaxes. Save changes and rename it robots.txt
Once you did, you should check it first by entering /robots.txt after your domain name. If you see the actual syntaxes which we added, that means it added properly.
Now, again login to Google Search Console and paste all syntaxes and click on Submit button. Now, you are done!
Also published on Medium.