Blogger equipped with a custom robots.txt facility. What is Robots.txt on blogger? “Robots.txt” is a file that is used to restrict access to the search engine robots (search engines eg Google, Bing, Yahoo) who are exploring or opening a web site that you have. Before they crawling web pages, they check in advance to see if a robots.txt file exists or not, and then there was an order in robots.txt (command) that prevent them from accessing certain pages.
We need a robots.txt to index the page in search engines (Google, Yahoo, Bing) or to not index the page that we did not want. And if we want all the content on our website in indexed by the search engines we do not need a robots.txt.
Benefits of Robots.txt for Bloggers
In terms of SEO optimization, Robots.txt ability to direct the “spider bot” crawl the targeted pages can be utilized to direct the spider bot on important pages as content pages so that the content quickly indexed by search engines.
When spider bot decided to crawl the site, the site will lose bandwidth in an amount not less. E..g if a particular directory on a directory search we restrict access then this will save bandwidth and failure crawl spider bot in our blog because poor access to the site will be resolved so that the spider bots can crawl freely around the content completely.
On the other hand if the index page in search engines can be assessed on a site highly qualified then the site would have a great chance to get the first page. This will be different from the results compared to a site that has many pages index but with less quality content and tend to show a lot of links in it
About a few months ago, Blogger introduced a feature search preferences, to manage various settings of on page SEO optimization. One of the most important is the setting / customization robots.txt. Not that the other features of search preferences are not important, but everything is covered with a hack that long ago we have done, namely the use of meta tags, which function is the same, and to this day is much more effective.
Previously, if still there from my friend who is less familiar with the robots.txt, I will give a little overview about the robots.txt in blogger. Use of robots.txt is to provide information on the robot crawler - both belonging to the search engines, aggregators, or indexing robots else - that a website, directory, or file / specific web page SHOULD NOT be indexed. For example, my friend does not want some blog pages (eg about, sitemap, labels, etc) are indexed by search engines, so my friend can use the command \ the robots.txt so that some pages are not indexed. It is actually the beginning of the use of robots.txt is to prohibit, disallow, then in order to allow development there, ALLOW.
Before there robots.txt features, we use a meta-index for this set (noindex, nofollow). However, after the custom features robots.txt blogger introduced, we can easily control the indexer's.
Robots.txt basically contains two lines of command, the first is the identity of the user agent (crawler, crawling robots), and the second is the prohibition orders.
user-agent: *Disallow:
The above command is translated as follows: User Agent is filled with asterisks meaning refers to ALL crawler, either owned or other search engines, such as: feed aggregators (even robot autoblog!). While Disallow not filled / empty means all, both the root directory, sub-directory, and files, MAY accessed by crawlers.
If you want to disallow crawler access the website, then we provide the command with / (slash), which means the robot crawlers can not access the entire contents of the web / blog.
user-agent: *Disallow: /
But if you want to disallow directory or index on a particular page, we write a sign / followed by the name of the directory. Examples such as my friend does not want the crawler to index all the static pages of writing are:
user-agent: *Disallow: / p (according to Blogger static pages directory name)Allow: /
Allow: / added to allow the crawler to allow root directory, other directories, and other pages indexed. The meaning of the above command is the crawler should index all the pages except static. Actually Allow: / is not added is not a problem, but to be sure, I then introduce and recommend the order.
If you want to refer to the SE owned robot crawler not to index certain, while others may be, my friend had to add the name of the user-agent, on another line. This example will use the Google bot.
user-agent: *Disallow:user-agent: GooglebotDisallow: / pAllow: /
Now of course bloggers could easily interpret the robots.txt commands above.
1 comments:
If you want to search more Robots.txt and Google updates then you can check here.
Post a Comment