Ok, so it’s not much of a “debate”, but there is certainly a lot of conflicting information our there about how to best use a robots.txt file for SEO. Some recommend creating a rather comprehensive file, while others say to keep it short and simple. And still others claim it’s not necessary to have one at all. I personally side with the two latter options, but that’s just me.
A robots.txt file’s main purpose is to simply tell the crawlers from the search engines (and other bots as well) which content they should skip when crawling your site. Most people’s first reaction to this is something along the lines of “well I want all of my pages crawled, right?” And in most cases (especially for small sites), they’re probably correct.
One argument for using your robots.txt file to disallow certain content from being crawled is to avoid any duplicate content that you may have floating around in your site. This duplicate content can come from having a print version or, if you’re using a CMS such as WordPress, if you have a post under multiple categories. In some of these instances, you may want to use your robots.txt file to filter some of the them out. As a personal opinion, however, this can sometimes (accidentally) do more harm than good.
So for those that are just getting started in SEO or are working on promoting their new (and possibly first) website, I’d recommend using a very basic robots.txt file that simply allows full access to your site. If that’s your wish, open your favorite text editor and enter the following:
User-agent: * Disallow:
Save it as “robots.txt” and upload it into the root directory of your site (yoursite.com/robots.txt). This tells all the bots crawling your site not to skip any of your content.
I’m sure there are plenty of others that would argue both for and against this method, but for the beginner, it’s better to play it safe until you have a little better handle on how everything works.