Good afternoon TechWelkin readers. Today, I am going to tell you about the importance of a small text file called robots.txt in context of search engine optimization (SEO). Last month, I was doing SEO for a client. My objective was to increase search engine traffic on client's website.
When I analyzed their site, I found most of the things were in order but there were a few obvious problems too. After analysis, I advised them to plug a few holes from which search ranking was leaking. Now after a month, on that website, search traffic has picked up and is still going upward.
One of the major problems that I noticed with that website was absence of robots.txt file. Let's see how this file could affect Google's view of your website.
What is robots.txt ?
robots.txt is a plain text file that tells search engines what to crawl and what to leave on your web server. This file contains directives for search engines bots, like Googlebot. These directives tell a bot about files and directories that it is allowed to crawl. Rest of the things are put beyond the reach of bots.
This file is called robots because it deals with the search (ro)bots. Search bots (aka robots) are automatic programs that crawl websites and make an index of what they find there.
Do search engines honor robots.txt ?
Good and responsible search engines (like Google, Bing, Yahoo! etc.) do honor your commands written in this file. However, this file can not force stop a search bot if it is bent upon spidering / crawling the disallowed locations.
Why robots.txt is important for SEO?
In recent year, search engines have been increasingly focusing on quality of a website. Everyone talks about content quality and it is true, without an iota of doubt, that high quality content is the biggest determining factor of your site's search rank. You write excellent stuff on your website. But if you think that search engine can only find what you post on your website —you're wrong! If you do not have a properly configured robots.txt in place —the search engines can snake through all the files and directories present on your server.
These files don't contain material that you want to offer to your website users. But search engines index them nonetheless; because you're not stopping them. Then upon analysis, search engines may find the content of these files irrelevant to the overall theme of your website. As a result of this irrelevance, your rank will tank.
Let's understand it by an example.
You have been working on your blog for several months and now, let's say, you've 100 good posts. Search engines have indexed them all. But because robots.txt is absent, search bots may also be indexing your “system files” (e.g. theme files, CMS files etc.) and other files that you have placed on server… now assuming that you've 100 such files available on server —the total number of entries in search index could be as high as 200 (100 posts + 100 other files present on server). Of these entries, 50% are irrelevant to your website's users and you would want to get them removed from Google's index. For this you should configure your robots.txt file
This is why it it very important to block certain portions of your web server from search bots; so that they index only what you really want to show to your visitors.
If Google has indexed significantly large number of entries than the number of your actual posts, then you would want to look into it. You should do a deep analysis of what Google has indexed and what it should actually be indexing.
I hope you got the idea why you need to have a well configured robots.txt -you'll definitely see good results if you configure it properly. Let me know if you have any questions or comments.