Zack Live

How to Write WordPress robots.txt for SEO

Robot

What is a robots.txt? Here is the answer from Google:

A robots.txt file provides restrictions to search engine robots (known as “bots”) that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.

Do we need a robots.txt for your WordPress blog? Yes, we do. Because every WordPress blog has content that is not necessary for search engine to index, for example, the wp-admin folder which includes admin files. Another reason for creating robots.txt is to prevent the search engine from reaching the same content from more than one location, you know, we can find the same post in the categary folders, monthly archives, or particular tag. Google never likes content duplication. A great robots.txt file will increase efficiency of google clawler, that’s a good job done for SEO.

Now the questiion is how to write a proper robots.txt. What should be include in the robots.txt file? Let’s make a list. Some obvious folders and url keyword should be write down at first, here is the first part of my robots.txt file:

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */comments
Allow: /wp-content/uploads

The first line “User-agent: *” means allowing all robots(Googlebot, Adsbot-google, etc) to index my blog. I disallowed all folders excluding the /wp-content/uploads folder, because this folder contains files uploaded for post content.

And then I will disallow all files with ? in url, because I’ve modified my Permalinks setting, there shouldn’t be any ? in my url. I wrote these two lines into robots.txt:

Disallow: /*?*
Disallow: /*?

We also need to disallow files ending with extendsions like .php, .js, etc to be indexed. You could modify this list depend on your own needs.

Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$

Finally, I would allow Google image bot and Google Adsense bot on entire site.

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*

# allow Google adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

You can copy my robots.txt for your WordPress blog, you will see there are 3 more lines in my file:

# BEGIN XML-SITEMAP-PLUGIN
Sitemap: http://zacklive.com/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN

These lines are generated by the XML-Sitemap plugin telling robots where the sitemap file is. And there is a plugin KB Robots.txt, which is designed to help bloggers with the robots.txt, you may like to take a look, but I don’t think it’s necessary to use a plugin for the robots.txt file.

After finish writing the robots.txt, you can go to Google Webmasters Tools to analyse your robots.txt.

Zack Live Free Resource. Zack Live Free Resource. Zack Live Free Resource

Related Posts

  • http://www.royzy.co.uk Roy Nottage

    Hi there, I was wondering if you could verify something for me? I have had my blog running for a few months now, but only a week or 2 ago realised my robots.txt is probably the reason my wordpress blog images don’t appear in Google. I had a gander to see what I should change it to, but I don’t know if they (who I got advice from) were right or not after reading your post.

    The link to what I have is here http://royzy.co.uk/robots.txt

    If it is right and does allow Google Images to search it, then it might be because I haven’t given Google enough time yet to index them. If it is wrong, I’ll change it as you have described in your post.

    Cheers,

    Roy

  • http://zacklive.com Zack

    I guess it may take some time for google to index your blog.

  • http://blog.rebeltraders.net Chuck

    After discovering this article about the robots file I went and checked mine. The only thing listed in my robots.txt file is the location for the sitemap, nothing else.

    I use the ‘all in one seo’ plugin on my site.. I have some options checked in the setting for ‘nofollow’. Is that the same thing?

    Or is my robots file seriously messed up?

    Thank you

  • http://zacklive.com Zack

    @Chuck
    Hi, Chuck, you need to modify the robots.txt file yourself, or else you got nothing inside it. You can try my one.

  • Pingback: SEO-Explained-By-Google

  • Hermann

    Why adsensebot=allow-whole-site doesn’t poses ranking degradation?

  • http://www.weterede.com/ Nacho Plaza

    That robots.txt has errors. Not all search engines robots allow wildcards and the “Allow” term. Google, Yahoo & Live.com accept them, but not others. Another advice: place “Allows” just after “User-agent”, because if there is a “Disallow” before that matches the url, the spider will not index it.

    Check the Google recomendations for robots.txt syntax & Wikipedia information:
    http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
    http://en.wikipedia.org/wiki/Robots_exclusion_standard

    Use a robots.txt syntax validator to check your file (this identifies “Allow” lines like errors, but they are correct for Google, Yahoo and Live, but not for * User-agent):

    http://tool.motoricerca.info/robots-checker.phtml

    You can check the robots.txt on my blog ( http://www.weterede.com/robots.txt ). I am new with WordPress, so it is under construction, because I do not jnow the WordPress internal structure.

  • http://www.seovegas.com Vegas SEO

    It take time for Google to get your website indexed.

    keep patience it will be working soon.

  • http://www.helloeverything.co.uk David the Web Designer

    Thanks for posting, just what i needed for my site. I used to just disallow the whole wp-admin file.

    • http://intensedebate.com/people/ZackLive ZackLive

      That's a good move.

  • http://www.techdunes.com sudeep

    Hello can you suggest me robot for my blog,please help me regarding this because i am confused about using which one or am i doing it correct.You can check my website and reply me.
    http://www.techdunes.com
    (presently I am using your robottxt example above.But please reply me is it okay for my blog.

    I would be grateful to you.

    Your comment is awaiting moderation.

  • http://choteustad.com/ Chote Ustad

    Here is an standard Robots.txt file. I think it is perfect for all blogs,

    User-agent: *

    Disallow: /cgi-bin

    Disallow: /wp-admin

    Disallow: /wp-includes

    Disallow: /wp-content/plugins

    Disallow: /wp-content/cache

    Disallow: /wp-content/themes

    Disallow: /category

    Disallow: /tag

    Disallow: /author

    Disallow: /trackback

    Disallow: /*trackback

    Disallow: /*trackback*

    Disallow: /*/trackback

    Disallow: /*?*

    Disallow: /*.html/$

    Disallow: /*feed*

    # Google Image

    User-agent: Googlebot-Image

    Disallow:

    Allow: /*

    # Google AdSense

    User-agent: Mediapartners-Google*

    Disallow:

    Allow: /*

    If you install WordPress in a separate directory for ex: WordPress or blog or any name

    I am considering WordPress here if you have any other name replace it with WordPress

    User-agent: *

    Disallow: /cgi-bin

    Disallow: /wordpress/wp-admin

    Disallow: /wordpress/wp-includes

    Disallow: /wordpress/wp-content/plugins

    Disallow: /wordpress/wp-content/cache

    Disallow: /wordpress/wp-content/themes

    Disallow: /category

    Disallow: /tag

    Disallow: /author

    Disallow: /trackback

    Disallow: /*trackback

    Disallow: /*trackback*

    Disallow: /*/trackback

    Disallow: /*?*

    Disallow: /*.html/$

    Disallow: /*feed*

    # Google Image

    User-agent: Googlebot-Image

    Disallow:

    Allow: /*

    # Google AdSense

    User-agent: Mediapartners-Google*

    Disallow:

    Allow: /*

    This form is not allowing me to write the last string

    Sitemap:

    copy it from above post.

  • http://georyl.com Gee

    Hi, I’m using thesis theme for my self-hosted wordpress blog. Where can I find the robot.txt file? Sorry, it might seem like a silly question but I’m figuring out the answer to this for some time now because my Google adsense says I’m blocking their robot. Thanks!

  • http://www.articlereleases.com John

    WP has a virtual robots.txt file located around 1700. Go to wp-includes then the functions.php.

  • http://www.62design.co.uk/blog Aaron

    Hi,

    One thing I should point out (which has only come to my attention recently) is that robots.txt is a file where you explicitly disallow bots from searching pages. It is exclusive rather than inclusive, and as such does not require (nor support) ‘Allow’ commands!