How to Write WordPress robots.txt for SEO
What is a robots.txt? Here is the answer from Google:
A robots.txt file provides restrictions to search engine robots (known as “bots”) that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages.
Do we need a robots.txt for your WordPress blog? Yes, we do. Because every WordPress blog has content that is not necessary for search engine to index, for example, the wp-admin folder which includes admin files. Another reason for creating robots.txt is to prevent the search engine from reaching the same content from more than one location, you know, we can find the same post in the categary folders, monthly archives, or particular tag. Google never likes content duplication. A great robots.txt file will increase efficiency of google clawler, that’s a good job done for SEO.
Now the questiion is how to write a proper robots.txt. What should be include in the robots.txt file? Let’s make a list. Some obvious folders and url keyword should be write down at first, here is the first part of my robots.txt file:
User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */comments
Allow: /wp-content/uploads
The first line “User-agent: *” means allowing all robots(Googlebot, Adsbot-google, etc) to index my blog. I disallowed all folders excluding the /wp-content/uploads folder, because this folder contains files uploaded for post content.
And then I will disallow all files with ? in url, because I’ve modified my Permalinks setting, there shouldn’t be any ? in my url. I wrote these two lines into robots.txt:
Disallow: /*?*
Disallow: /*?
We also need to disallow files ending with extendsions like .php, .js, etc to be indexed. You could modify this list depend on your own needs.
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Finally, I would allow Google image bot and Google Adsense bot on entire site.
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*# allow Google adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
You can copy my robots.txt for your WordPress blog, you will see there are 3 more lines in my file:
# BEGIN XML-SITEMAP-PLUGIN
Sitemap: http://zacklive.com/sitemap.xml.gz
# END XML-SITEMAP-PLUGIN
These lines are generated by the XML-Sitemap plugin telling robots where the sitemap file is. And there is a plugin KB Robots.txt, which is designed to help bloggers with the robots.txt, you may like to take a look, but I don’t think it’s necessary to use a plugin for the robots.txt file.
After finish writing the robots.txt, you can go to Google Webmasters Tools to analyse your robots.txt.
.
.






Pingback: SEO-Explained-By-Google