What is a robots.txt? Here is the answer from Google:
A robots.txt file provides restrictions to search engine robots (known as “bots”) that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.
Do we need a robots.txt for your WordPress blog? Yes, we do. Because every WordPress blog has content that is not necessary for search engine to index, for example, the wp-admin folder which includes admin files. Another reason for creating robots.txt is to prevent the search engine from reaching the same content from more than one location, you know, we can find the same post in the categary folders, monthly archives, or particular tag. Google never likes content duplication. A great robots.txt file will increase efficiency of google clawler, that’s a good job done for SEO.
Now the questiion is how to write a proper robots.txt. What should be include in the robots.txt file? Let’s make a list. Some obvious folders and url keyword should be write down at first, here is the first part of my robots.txt file:
User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */comments
Allow: /wp-content/uploads
The first line “User-agent: *” means allowing all robots(Googlebot, Adsbot-google, etc) to index my blog. I disallowed all folders excluding the /wp-content/uploads folder, because this folder contains files uploaded for post content.
And then I will disallow all files with ? in url, because I’ve modified my Permalinks setting, there shouldn’t be any ? in my url. I wrote these two lines into robots.txt:
Disallow: /*?*
Disallow: /*?
We also need to disallow files ending with extendsions like .php, .js, etc to be indexed. You could modify this list depend on your own needs.
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Finally, I would allow Google image bot and Google Adsense bot on entire site.
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*# allow Google adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
You can copy my robots.txt for your WordPress blog, you will see there are 3 more lines in my file:
# BEGIN XML-SITEMAP-PLUGIN
Sitemap: http://zacklive.com/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN
These lines are generated by the XML-Sitemap plugin telling robots where the sitemap file is. And there is a plugin KB Robots.txt, which is designed to help bloggers with the robots.txt, you may like to take a look, but I don’t think it’s necessary to use a plugin for the robots.txt file.
After finish writing the robots.txt, you can go to Google Webmasters Tools to analyse your robots.txt.
Why Not Subscribe?
Help us to create more FREE WordPress Themes, subscribe for Membership, you get 3 WordPress Themes EVERY MONTH and FREE WordPress Consultant Service and more just for $7/mo.Or donate to help us to create more FREE WordPress Themes:
Or subscribe our RRS Feed for news updates, or follow us on Twitter.

8 Responses to “How to Write WordPress robots.txt for SEO”
Hi there, I was wondering if you could verify something for me? I have had my blog running for a few months now, but only a week or 2 ago realised my robots.txt is probably the reason my wordpress blog images don’t appear in Google. I had a gander to see what I should change it to, but I don’t know if they (who I got advice from) were right or not after reading your post.
The link to what I have is here http://royzy.co.uk/robots.txt
If it is right and does allow Google Images to search it, then it might be because I haven’t given Google enough time yet to index them. If it is wrong, I’ll change it as you have described in your post.
Cheers,
Roy
I guess it may take some time for google to index your blog.
After discovering this article about the robots file I went and checked mine. The only thing listed in my robots.txt file is the location for the sitemap, nothing else.
I use the ‘all in one seo’ plugin on my site.. I have some options checked in the setting for ‘nofollow’. Is that the same thing?
Or is my robots file seriously messed up?
Thank you
@Chuck
Hi, Chuck, you need to modify the robots.txt file yourself, or else you got nothing inside it. You can try my one.
[...] Use the robots.txt file. Tell the search engines where and what to index and not index. For example, you dont want them to index your download page, right? Google also provides a useful too for that as well: Google robots.txt Tool Tips: disallow your internal SERPs to be indexed, near duplicate pages and URLs created by a proxy service. How to write a robot.txt file for Wordpress [...]
Why adsensebot=allow-whole-site doesn’t poses ranking degradation?
That robots.txt has errors. Not all search engines robots allow wildcards and the “Allow” term. Google, Yahoo & Live.com accept them, but not others. Another advice: place “Allows” just after “User-agent”, because if there is a “Disallow” before that matches the url, the spider will not index it.
Check the Google recomendations for robots.txt syntax & Wikipedia information:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
http://en.wikipedia.org/wiki/Robots_exclusion_standard
Use a robots.txt syntax validator to check your file (this identifies “Allow” lines like errors, but they are correct for Google, Yahoo and Live, but not for * User-agent):
http://tool.motoricerca.info/robots-checker.phtml
You can check the robots.txt on my blog ( http://www.weterede.com/robots.txt ). I am new with Wordpress, so it is under construction, because I do not jnow the WordPress internal structure.
It take time for Google to get your website indexed.
keep patience it will be working soon.